Exact and Heuristic Algorithms for the Indel Maximum Likelihood Problem
- 1 May 2007
- journal article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 14 (4), 446-461
- https://doi.org/10.1089/cmb.2007.a006
Abstract
Given a multiple alignment of orthologous DNA sequences and a phylogenetic tree for these sequences, we investigate the problem of reconstructing the most likely scenario of insertions and deletions capable of explaining the gaps observed in the alignment. This problem, that we called the Indel Maximum Likelihood Problem (IMLP), is an important step toward the reconstruction of ancestral genomics\ud sequences, and is important for studying evolutionary processes, genome function, adaptation and convergence. We solve the IMLP using a new type of tree hidden Markov model whose states correspond to single-base evolutionary scenarios and where transitions model dependencies between neighboring columns. The standard Viterbi and Forward-backward algorithms are optimized to produce the most likely ancestral reconstruction and to compute the level of confidence associated to specific regions of the reconstruction. A heuristic\ud is presented to make the method practical for large data sets, while retaining an extremely high degree of accuracy. The methods are illustrated on a 1Mb alignment of the CFTR regions from 12 mammalsKeywords
This publication has 24 references indexed in Scilit:
- ON THE INFERENCE OF PARSIMONIOUS INDEL EVOLUTIONARY SCENARIOSJournal of Bioinformatics and Computational Biology, 2006
- Reconstructing large regions of an ancestral mammalian genome in silicoGenome Research, 2004
- The ENCODE (ENCyclopedia Of DNA Elements) ProjectScience, 2004
- MAVID: Constrained Ancestral Alignment of Multiple SequencesGenome Research, 2004
- Aligning Multiple Genomic Sequences With the Threaded Blockset AlignerGenome Research, 2004
- LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNAGenome Research, 2003
- The past as the key to the present: Resurrection of ancient proteins from eosinophilsProceedings of the National Academy of Sciences, 2002
- A Hidden Markov Model approach to variation among sites in rate of evolutionMolecular Biology and Evolution, 1996
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981
- Toward Defining the Course of Evolution: Minimum Change for a Specific Tree TopologySystematic Zoology, 1971