Probabilistic Phylogenetic Inference with Insertions and Deletions
Open Access
- 19 September 2008
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 4 (9), e1000172
- https://doi.org/10.1371/journal.pcbi.1000172
Abstract
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth–death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new “concordance test” benchmark on real ribosomal RNA alignments, we show that the extended program dnamlε improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm. We describe a computationally efficient method to use insertion and deletion events, in addition to substitutions, in phylogenetic inference. To date, many evolutionary models in probabilistic phylogenetic inference methods have only accounted for substitution events, not for insertions and deletions. As a result, not only do tree inference methods use less sequence information than they could, but also it has remained difficult to integrate phylogenetic modeling into sequence alignment methods (such as profiles and profile-hidden Markov models) that inherently require a model of insertion and deletion events. Therefore an important goal in the field has been to develop tractable evolutionary models of insertion/deletion events over time of sufficient accuracy to increase the resolution of phylogenetic inference methods and to increase the power of profile-based sequence homology searches. Our model offers a partial answer to this problem. We show that our model generally improves inference power in both simulated and real data and that it is easily implemented in the framework of standard inference packages with little effect on computational efficiency (we extended dnaml, in Felsenstein's popular phylip package).Keywords
This publication has 72 references indexed in Scilit:
- Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomesGenome Research, 2005
- Empirical Analysis of Protein Insertions and Deletions Determining Parameters for the Correct Placement of Gaps in Protein Sequence AlignmentsJournal of Molecular Biology, 2004
- Evolutionary rate variation and RNA secondary structure predictionComputational Biology and Chemistry, 2004
- A new criterion and method for amino acid classificationJournal of Theoretical Biology, 2004
- MCALIGN: Stochastic Alignment of Noncoding DNA Sequences Based on an Evolutionary Model of Sequence EvolutionGenome Research, 2004
- Empirical Models for Substitution in Ribosomal RNAMolecular Biology and Evolution, 2003
- Using Evolutionary Trees in Protein Secondary Structure Prediction and Other Comparative Sequence AnalysesJournal of Molecular Biology, 1996
- Maximum likelihood alignment of DNA sequencesJournal of Molecular Biology, 1986
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981
- A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequencesJournal of Molecular Evolution, 1980