Probabilistic Phylogenetic Inference with Insertions and Deletions

Open Access

19 September 2008

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 4 (9), e1000172
https://doi.org/10.1371/journal.pcbi.1000172

Abstract

A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth–death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new “concordance test” benchmark on real ribosomal RNA alignments, we show that the extended program dnamlε improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm. We describe a computationally efficient method to use insertion and deletion events, in addition to substitutions, in phylogenetic inference. To date, many evolutionary models in probabilistic phylogenetic inference methods have only accounted for substitution events, not for insertions and deletions. As a result, not only do tree inference methods use less sequence information than they could, but also it has remained difficult to integrate phylogenetic modeling into sequence alignment methods (such as profiles and profile-hidden Markov models) that inherently require a model of insertion and deletion events. Therefore an important goal in the field has been to develop tractable evolutionary models of insertion/deletion events over time of sufficient accuracy to increase the resolution of phylogenetic inference methods and to increase the power of profile-based sequence homology searches. Our model offers a partial answer to this problem. We show that our model generally improves inference power in both simulated and real data and that it is easily implemented in the framework of standard inference packages with little effect on computational efficiency (we extended dnaml, in Felsenstein's popular phylip package).

Keywords

This publication has 72 references indexed in Scilit:

Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
Genome Research, 2005
Empirical Analysis of Protein Insertions and Deletions Determining Parameters for the Correct Placement of Gaps in Protein Sequence Alignments
Journal of Molecular Biology, 2004
Evolutionary rate variation and RNA secondary structure prediction
Computational Biology and Chemistry, 2004
A new criterion and method for amino acid classification
Journal of Theoretical Biology, 2004
MCALIGN: Stochastic Alignment of Noncoding DNA Sequences Based on an Evolutionary Model of Sequence Evolution
Genome Research, 2004
Empirical Models for Substitution in Ribosomal RNA
Molecular Biology and Evolution, 2003
Using Evolutionary Trees in Protein Secondary Structure Prediction and Other Comparative Sequence Analyses
Journal of Molecular Biology, 1996
Maximum likelihood alignment of DNA sequences
Journal of Molecular Biology, 1986
Evolutionary trees from DNA sequences: A maximum likelihood approach
Journal of Molecular Evolution, 1981
A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences
Journal of Molecular Evolution, 1980

Cited by 52 articles