Spatial and Temporal Heterogeneity in Nucleotide Sequence Evolution
Open Access
- 23 April 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 25 (8), 1683-1694
- https://doi.org/10.1093/molbev/msn119
Abstract
Models of nucleotide substitution make many simplifying assumptions about the evolutionary process, including that the same process acts on all sites in an alignment and on all branches on the phylogenetic tree. Many studies have shown that in reality the substitution process is heterogeneous and that this variability can introduce systematic errors into many forms of phylogenetic analyses. I propose a new rigorous approach for describing heterogeneity called a temporal hidden Markov model (THMM), which can distinguish between among site (spatial) heterogeneity and among lineage (temporal) heterogeneity. Several versions of the THMM are applied to 16 sets of aligned sequences to quantitatively assess the different forms of heterogeneity acting within them. The most general THMM provides the best fit in all the data sets examined, providing strong evidence of pervasive heterogeneity during evolution. Investigating individual forms of heterogeneity provides further insights. In agreement with previous studies, spatial rate heterogeneity (rates across sites [RAS]) is inferred to be the single most prevalent form of heterogeneity. Interestingly, RAS appears so dominant that failure to independently include it in the THMM masks other forms of heterogeneity, particularly temporal heterogeneity. Incorporating RAS into the THMM reveals substantial temporal and spatial heterogeneity in nucleotide composition and bias toward transition substitution in all alignments examined, although the relative importance of different forms of heterogeneity varies between data sets. Furthermore, the improvements in model fit observed by adding complexity to the model suggest that the THMMs used in this study do not capture all the evolutionary heterogeneity occurring in the data. These observations all indicate that current tests may consistently underestimate the degree of temporal heterogeneity occurring in data. Finally, there is a weak link between the amount of heterogeneity detected and the level of divergence between the sequences, suggesting that variability in the evolutionary process will be a particular problem for deep phylogeny.Keywords
This publication has 33 references indexed in Scilit:
- A Novel Method for Detecting Intramolecular Coevolution: Adding a Further Dimension to Selective Constraints AnalysesGenetics, 2006
- Phylogenetics by likelihood: Evolutionary modeling as a tool for understanding the genomeJournal of Biomedical Informatics, 2005
- Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomesGenome Research, 2005
- Covarion Structure in Plastid Genome Evolution: A New Statistical TestMolecular Biology and Evolution, 2005
- Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneousNature, 2004
- Predicting Functional Sites in Proteins: Site-specific Evolutionary Models and Their Application to Neurotransmitter TransportersJournal of Molecular Biology, 2004
- Among-site rate variation and its impact on phylogenetic analysesTrends in Ecology & Evolution, 1996
- Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard ConditionsJournal of the American Statistical Association, 1987
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981
- A new look at the statistical model identificationIEEE Transactions on Automatic Control, 1974