Bayesian analysis of amino acid substitution models
- 7 October 2008
- journal article
- Published by The Royal Society in Philosophical Transactions Of The Royal Society B-Biological Sciences
- Vol. 363 (1512), 3941-3953
- https://doi.org/10.1098/rstb.2008.0175
Abstract
Models of amino acid substitution present challenges beyond those often faced with the analysis of DNA sequences. The alignments of amino acid sequences are often small, whereas the number of parameters to be estimated is potentially large when compared with the number of free parameters for nucleotide substitution models. Most approaches to the analysis of amino acid alignments have focused on the use of fixed amino acid models in which all of the potentially free parameters are fixed to values estimated from a large number of sequences. Often, these fixed amino acid models are specific to a gene or taxonomic group (e.g. the Mtmam model, which has parameters that are specific to mammalian mitochondrial gene sequences). Although the fixed amino acid models succeed in reducing the number of free parameters to be estimated—indeed, they reduce the number of free parameters from approximately 200 to 0—it is possible that none of the currently available fixed amino acid models is appropriate for a specific alignment. Here, we present four approaches to the analysis of amino acid sequences. First, we explore the use of a general time reversible model of amino acid substitution using a Dirichlet prior probability distribution on the 190 exchangeability parameters. Second, we then explore the behaviour of prior probability distributions that are ‘centred’ on the rates specified by the fixed amino acid model. Third, we consider a mixture of fixed amino acid models. Finally, we consider constraints on the exchangeability parameters as partitions, similar to how nucleotide substitution models are specified, and place a Dirichlet process prior model on all the possible partitioning schemes.Keywords
This publication has 39 references indexed in Scilit:
- Inference of Population Structure Under a Dirichlet Process ModelGenetics, 2007
- A Dirichlet process model for detecting positive selection in protein-coding DNA sequencesProceedings of the National Academy of Sciences, 2006
- rtREV: An Amino Acid Substitution Matrix for Inference of Retrovirus and Reverse Transcriptase PhylogenyJournal of Molecular Evolution, 2002
- Partition-distance: A problem and class of perfect graphs arising in clusteringInformation Processing Letters, 2002
- Inference from Iterative Simulation Using Multiple SequencesStatistical Science, 1992
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981
- A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequencesJournal of Molecular Evolution, 1980
- A Bayesian Analysis of Some Nonparametric ProblemsThe Annals of Statistics, 1973
- On Information and SufficiencyThe Annals of Mathematical Statistics, 1951