On Reduced Amino Acid Alphabets for Phylogenetic Inference

23 May 2007

journal article
research article
Published by Oxford University Press (OUP) in Molecular Biology and Evolution

Vol. 24 (9), 2139-2150
https://doi.org/10.1093/molbev/msm144

Abstract

We investigate the use of Markov models of evolution for reduced amino acid alphabets or bins of amino acids. The use of reduced amino acid alphabets can ameliorate effects of model misspecification and saturation. We present algorithms for 2 different ways of automating the construction of bins: minimizing criteria based on properties of rate matrices and minimizing criteria based on properties of alignments. By simulation, we show that in the absence of model misspecification, the loss of information due to binning is found to be insubstantial, and the use of Markov models at the binned level is found to be almost as effective as the more appropriate missing data approach. By applying these approaches to real data sets where compositional heterogeneity and/or saturation appear to be causing biased tree estimation, we find that binning can improve topological estimation in practice.

Keywords

This publication has 18 references indexed in Scilit:

Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences
Bioinformatics, 2006
The Comparison of the Confidence Regions in Phylogeny
Molecular Biology and Evolution, 2005
Identifying the Basal Angiosperm Node in Chloroplast Genome Phylogenies: Sampling One's Way Out of the Felsenstein Zone
Molecular Biology and Evolution, 2005
Chloroplast genome phylogenetics: why we need independent approaches to plant molecular evolution
Trends in Plant Science, 2005
The place of Amborella within the radiation of angiosperms
Trends in Plant Science, 2005
Multigene Analyses of Bilaterian Animals Corroborate the Monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia
Molecular Biology and Evolution, 2005
A new criterion and method for amino acid classification
Journal of Theoretical Biology, 2004
The Chloroplast Genome of Nymphaea alba: Whole-Genome Analyses and the Problem of Identifying the Most Basal Angiosperm
Molecular Biology and Evolution, 2004
Analysis of the Amborella trichopoda Chloroplast Genome Sequence Suggests That Amborella Is Not a Basal Angiosperm
Molecular Biology and Evolution, 2003
Testing Substitution Models Within a Phylogenetic Tree
Molecular Biology and Evolution, 2003

Cited by 153 articles