Calculation of evolutionary trees from sequence data.

Abstract
Evolutionary trees are usually calculated from comparisons of protein or nucleic acid sequences from present-day organisms by use of algorithms that use only the difference matrix, where the difference matrix is constructed from the sequence differences between pairs of sequences from the organisms. The difference matrix alone cannot define uniquely the correct position of the ancestor of the present-day organisms (root of the tree). Methods using the difference matrix alone often fail to give the correct pattern of tree branching (topology) when the different sequences evolve at different rates. Only for equal rates of evolution can the difference matrix (when used with the so-called matrix method) yield exactly the correct topology and root. A method for calculating evolutionary trees from sequence data that uses, along with the difference matrix, the rate of evolution of the various sequences from their common ancestor is presented. This method uniquely determines both the correct tree topology and root in theory for unequal rates of sequence evolution. Estimation of an ancestral sequence to be used in the method is discussed in particular for the 5S RNA sequences from prokaryotes and eukaryotes and for ferredoxin sequences.