# Phylogenetic tree

34,191pages on
this wiki

A phylogenetic tree, also called an evolutionary tree or a tree of life, is a tree showing the evolutionary interrelationships among various species or other entities that are believed to have a common ancestor. In a phylogenetic tree, each node with descendants represents the most recent common ancestor of the descendants, with edge lengths sometimes corresponding to time estimates. Each node in a phylogenetic tree is called a taxonomic unit. Internal nodes are generally referred to as Hypothetical Taxonomic Units (HTUs) as they cannot be directly observed.

Though the idea of a "tree" originally arose from earlier ideas of life as a progression from lower to higher forms (The Great Chain of Being), modern evolutionary biologists still use trees to depict evolution because it still effectively captures the idea speciation occurs through the splitting of lineages.

## Types of phylogenetic trees

A rooted phylogenetic tree is a directed tree with a unique node corresponding to the (usually imputed) most recent common ancestor of all the entities at the leaves of the tree. Figure 1 depicts a rooted phylogenetic tree, which has been colored according to the three-domain system[2]. The most common method for rooting trees is the use of an uncontroversial outgroup - close enough to allow inference from sequence or trait data, but far enough to be a clear outgroup.

Unrooted trees illustrate the relatedness of the leaf nodes without making assumptions about ancestry. While unrooted trees can always be generated from rooted ones by simply omitting the root, a root cannot be inferred from an unrooted tree without some means of identifying ancestry; this is normally done by including an outgroup in the input data or introducing additional assumptions about the relative rates of evolution on each branch, such as an application of the molecular clock hypothesis. Figure 2 depicts an unrooted phylogenetic tree[3] for myosin, a superfamily of proteins.

Both rooted and unrooted phylogenetic trees can be either bifurcating or multifurcating, and either labeled or unlabeled. A bifurcating tree has a maximum of two descendants arising from each interior node, while a multifurcating tree may have more than two. A labeled tree has specific values assigned to its leaves, while an unlabeled tree, sometimes called a tree shape, only defines a topology. The number of possible trees for a given number of leaf nodes depends on the specific type of tree, but there are always more multifurcating than bifurcating trees, more labeled than unlabeled trees, and more rooted than unrooted trees. The last distinction is the most biologically relevant; it arises because there are many places on an unrooted tree to put the root. For labeled bifurcating trees, there are

$\frac{(2n-3)!}{2^{n-2}(n-2)!}$

total rooted trees and

$\frac{(2n-5)!}{2^{n-3}(n-3)!}$

total unrooted trees, where n represents the number of leaf nodes. The number of unrooted trees for n input sequences or species is equal to the number of rooted trees for n-1 sequences.[4]

A Dendrogram is a broad term for the diagrammatic representation of a phylogenetic tree.

A Cladogram is a tree formed using cladistic methods. This type of tree only represents a branching pattern, i.e., its branch lengths do not represent time.

A Phylogram is a phylogenetic tree that explicitly represents number of character changes through its branch lengths.

A Chronogram is a phylogenetic tree that explicitly represents evolutionary time through its branch lengths.

## Phylogenetic tree construction

Main article: Computational phylogenetics

Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. Distance-matrix methods such as neighbor-joining, which require multiple sequence alignments to calculate genetic distance, are simplest to implement; many sequence alignment methods such as ClustalW produce both sequence alignments and phylogenetic trees. Other methods include maximum parsimony and probabilistic inference techniques such as maximum likelihood; Bayesian inference has also been applied to phylogenetics but has been controversial.[4] Identifying the optimal tree using many of these techniques is NP-complete or NP-hard[4], so heuristic search and optimization methods are used in combination with tree-scoring functions to identify a reasonably good tree that fits the data.

Tree-building methods can be assessed on the basis of several criteria:[5]

• efficiency (how long does it take to compute the answer, how much memory does it need?)
• power (does it make good use of the data, or is information being wasted?)
• consistency (will it converge on the same answer repeatedly, if each time given different data for the same model problem?)
• robustness (does it cope well with violations of the assumptions of the underlying model?)
• falsifiability (does it alert us when it is not good to use, i.e. when assumptions are violated?)

Tree-building techniques have also gained the attention of mathematicians. Trees can also be built using T-theory. [6]

## Limitations of phylogenetic trees

Although phylogenetic trees produced on the basis of sequenced genes or genomic data in different species can provide evolutionary insight, they do have important limitations. Phylogenetic trees do not necessarily (and likely do not) represent actual evolutionary history. The data on which they are based is noisy; horizontal gene transfer[7], hybridisation between species that were not nearest neighbors on the tree before hybridisation takes place, convergent evolution, and conserved sequences can all confound the analysis. One method of analysis implemented in the program PhyloCode does not assume a tree structure to avoid these limitations.

Furthermore, basing the analysis on a single gene or protein taken from a group of species can be problematic because such trees constructed from another unrelated gene or protein sequence often differ from the first, and therefore great care is needed in inferring phylogenetic relationships amongst species. This is most true of genetic material that is subject to lateral gene transfer and recombination, where different haplotype blocks can have different histories.

When extinct species are included in a tree, they should always be terminal nodes, as it is unlikely that they are direct ancestors of any extant species. Scepticism must apply when extinct species are included in trees that are wholly or partly based on DNA sequence data, due to evidence that "ancient DNA" is not preserved intact for longer than 100,000 years.[How to reference and link to summary or text]