Assessment of Substitution Model Adequacy Using Frequentist and Bayesian Methods
Open Access
- 8 July 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 27 (12), 2790-2803
- https://doi.org/10.1093/molbev/msq168
Abstract
In order to have confidence in model-based phylogenetic methods, such as maximum likelihood (ML) and Bayesian analyses, one must use an appropriate model of molecular evolution identified using statistically rigorous criteria. Although model selection methods such as the likelihood ratio test and Akaike information criterion are widely used in the phylogenetic literature, model selection methods lack the ability to reject all models if they provide an inadequate fit to the data. There are two methods, however, that assess absolute model adequacy, the frequentist Goldman–Cox (GC) test and Bayesian posterior predictive simulations (PPSs), which are commonly used in conjunction with the multinomial log likelihood test statistic. In this study, we use empirical and simulated data to evaluate the adequacy of common substitution models using both frequentist and Bayesian methods and compare the results with those obtained with model selection methods. In addition, we investigate the relationship between model adequacy and performance in ML and Bayesian analyses in terms of topology, branch lengths, and bipartition support. We show that tests of model adequacy based on the multinomial likelihood often fail to reject simple substitution models, especially when the models incorporate among-site rate variation (ASRV), and normally fail to reject less complex models than those chosen by model selection methods. In addition, we find that PPSs often fail to reject simpler models than the GC test. Use of the simplest substitution models not rejected based on fit normally results in similar but divergent estimates of tree topology and branch lengths. In addition, use of the simplest adequate substitution models can affect estimates of bipartition support, although these differences are often small with the largest differences confined to poorly supported nodes. We also find that alternative assumptions about ASRV can affect tree topology, tree length, and bipartition support. Our results suggest that using the simplest substitution models not rejected based on fit may be a valid alternative to implementing more complex models identified by model selection methods. However, all common substitution models may fail to recover the correct topology and assign appropriate bipartition support if the true tree shape is difficult to estimate regardless of model adequacy.Keywords
This publication has 55 references indexed in Scilit:
- Evolutionary history of the greater white-toothed shrew (Crocidura russula) inferred from analysis of mtDNA, Y, and X chromosome markersMolecular Phylogenetics and Evolution, 2005
- Evaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny EstimationMolecular Biology and Evolution, 2005
- Accounting for coalescent stochasticity in testing phylogeographical hypotheses: modelling Pleistocene population structure in the Idaho giant salamanderDicamptodon aterrimusMolecular Ecology, 2004
- Accounting for Uncertainty in the Tree Topology Has Little Effect on the Decision-Theoretic Approach to Model Selection in Phylogeny EstimationMolecular Biology and Evolution, 2004
- Phytophthora pistaciae sp. nov. and P. melonis: the principal causes of pistachio gummosis in IranMycological Research, 2001
- A Stochastic Model for the Evolution of Autocorrelated DNA SequencesMolecular Phylogenetics and Evolution, 1994
- Factors Determining the Accuracy of Cladogram Estimation: Evaluation Using Computer SimulationEvolution, 1985
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981
- A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequencesJournal of Molecular Evolution, 1980
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978