Assessment of Substitution Model Adequacy Using Frequentist and Bayesian Methods

Open Access

8 July 2010

journal article
research article
Published by Oxford University Press (OUP) in Molecular Biology and Evolution

Vol. 27 (12), 2790-2803
https://doi.org/10.1093/molbev/msq168

Abstract

In order to have confidence in model-based phylogenetic methods, such as maximum likelihood (ML) and Bayesian analyses, one must use an appropriate model of molecular evolution identified using statistically rigorous criteria. Although model selection methods such as the likelihood ratio test and Akaike information criterion are widely used in the phylogenetic literature, model selection methods lack the ability to reject all models if they provide an inadequate fit to the data. There are two methods, however, that assess absolute model adequacy, the frequentist Goldman–Cox (GC) test and Bayesian posterior predictive simulations (PPSs), which are commonly used in conjunction with the multinomial log likelihood test statistic. In this study, we use empirical and simulated data to evaluate the adequacy of common substitution models using both frequentist and Bayesian methods and compare the results with those obtained with model selection methods. In addition, we investigate the relationship between model adequacy and performance in ML and Bayesian analyses in terms of topology, branch lengths, and bipartition support. We show that tests of model adequacy based on the multinomial likelihood often fail to reject simple substitution models, especially when the models incorporate among-site rate variation (ASRV), and normally fail to reject less complex models than those chosen by model selection methods. In addition, we find that PPSs often fail to reject simpler models than the GC test. Use of the simplest substitution models not rejected based on fit normally results in similar but divergent estimates of tree topology and branch lengths. In addition, use of the simplest adequate substitution models can affect estimates of bipartition support, although these differences are often small with the largest differences confined to poorly supported nodes. We also find that alternative assumptions about ASRV can affect tree topology, tree length, and bipartition support. Our results suggest that using the simplest substitution models not rejected based on fit may be a valid alternative to implementing more complex models identified by model selection methods. However, all common substitution models may fail to recover the correct topology and assign appropriate bipartition support if the true tree shape is difficult to estimate regardless of model adequacy.

Keywords

This publication has 55 references indexed in Scilit:

Evolutionary history of the greater white-toothed shrew (Crocidura russula) inferred from analysis of mtDNA, Y, and X chromosome markers
Molecular Phylogenetics and Evolution, 2005
Evaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation
Molecular Biology and Evolution, 2005
Accounting for coalescent stochasticity in testing phylogeographical hypotheses: modelling Pleistocene population structure in the Idaho giant salamanderDicamptodon aterrimus
Molecular Ecology, 2004
Accounting for Uncertainty in the Tree Topology Has Little Effect on the Decision-Theoretic Approach to Model Selection in Phylogeny Estimation
Molecular Biology and Evolution, 2004
Phytophthora pistaciae sp. nov. and P. melonis: the principal causes of pistachio gummosis in Iran
Mycological Research, 2001
A Stochastic Model for the Evolution of Autocorrelated DNA Sequences
Molecular Phylogenetics and Evolution, 1994
Factors Determining the Accuracy of Cladogram Estimation: Evaluation Using Computer Simulation
Evolution, 1985
Evolutionary trees from DNA sequences: A maximum likelihood approach
Journal of Molecular Evolution, 1981
A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences
Journal of Molecular Evolution, 1980
Estimating the Dimension of a Model
The Annals of Statistics, 1978

Cited by 43 articles