Multidimensional Vector Space Representation for Convergent Evolution and Molecular Phylogeny
Open Access
- 17 November 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 22 (3), 704-715
- https://doi.org/10.1093/molbev/msi051
Abstract
With growing amounts of genome data and constant improvement of models of molecular evolution, phylogenetic reconstruction became more reliable. However, our knowledge of the real process of molecular evolution is still limited. When enough large-sized data sets are analyzed, any subtle biases in statistical models can support incorrect topologies significantly because of the high signal-to-noise ratio. We propose a procedure to locate sequences in a multidimensional vector space (MVS), in which the geometry of the space is uniquely determined in such a way that the vectors of sequence evolution are orthogonal among different branches. In this paper, the MVS approach is developed to detect and remove biases in models of molecular evolution caused by unrecognized convergent evolution among lineages or unexpected patterns of substitutions. Biases in the estimated pairwise distances are identified as deviations (outliers) of sequence spatial vectors from the expected orthogonality. Modifications to the estimated distances are made by minimizing an index to quantify the deviations. In this way, it becomes possible to reconstruct the phylogenetic tree, taking account of possible biases in the model of molecular evolution. The efficacy of the modification procedure was verified by simulating evolution on various topologies with rate heterogeneity and convergent change. The phylogeny of placental mammals in previous analyses of large data sets has varied according to the genes being analyzed. Systematic deviations caused by convergent evolution were detected by our procedure in all representative data sets and were found to strongly affect the tree structure. However, the bias correction yielded a consistent topology among data sets. The existence of strong biases was validated by examining the sites of convergent evolution between the hedgehog and other species in mitochondrial data set. This convergent evolution explains why it has been difficult to determine the phylogenetic placement of the hedgehog in previous studies.Keywords
This publication has 30 references indexed in Scilit:
- Dating the Tree of LifeScience, 2003
- RNA-based phylogenetic methods: application to mammalian mitochondrial RNA sequencesMolecular Phylogenetics and Evolution, 2003
- δ Plots: A Tool for Analyzing Phylogenetic Distance DataMolecular Biology and Evolution, 2002
- Mammalian mitogenomic relationships and the root of the eutherian treeProceedings of the National Academy of Sciences, 2002
- Quantitative Analysis of the Timing of the Origin and Diversification of Extant Placental OrdersJournal of Mammalian Evolution, 2001
- Interordinal relationships and timescale of eutherian evolution as inferred from mitochondrial genome dataGene, 2000
- Molecular evidence for the early divergence of placental mammalsBioEssays, 1999
- The Fossil Record of North American Mammals: Evidence for a Paleocene Evolutionary RadiationSystematic Biology, 1999
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Mitochondrial DNA and human evolutionNature, 1987