Genome-wide nucleotide-level mammalian ancestor reconstruction
- 10 October 2008
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 18 (11), 1829-1843
- https://doi.org/10.1101/gr.076521.108
Abstract
Recently attention has been turned to the problem of reconstructing complete ancestral sequences from large multiple alignments. Successful generation of these genome-wide reconstructions will facilitate a greater knowledge of the events that have driven evolution. We present a new evolutionary alignment modeler, called “Ortheus,” for inferring the evolutionary history of a multiple alignment, in terms of both substitutions and, importantly, insertions and deletions. Based on a multiple sequence probabilistic transducer model of the type proposed by Holmes, Ortheus uses efficient stochastic graph-based dynamic programming methods. Unlike other methods, Ortheus does not rely on a single fixed alignment from which to work. Ortheus is also more scaleable than previous methods while being fast, stable, and open source. Large-scale simulations show that Ortheus performs close to optimally on a deep mammalian phylogeny. Simulations also indicate that significant proportions of errors due to insertions and deletions can be avoided by not assuming a fixed alignment. We additionally use a challenging hold-out cross-validation procedure to test the method; using the reconstructions to predict extant sequence bases, we demonstrate significant improvements over using closest extant neighbor sequences. Accompanying this paper, a new, public, and genome-wide set of Ortheus ancestor alignments provide an intriguing new resource for evolutionary studies in mammals. As a first piece of analysis, we attempt to recover “fossilized” ancestral pseudogenes. We confidently find 31 cases in which the ancestral sequence had a more complete sequence than any of the extant sequences.Keywords
This publication has 54 references indexed in Scilit:
- Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogsGenome Research, 2008
- Ensembl 2008Nucleic Acids Research, 2007
- Comparative Genomics Search for Losses of Long-Established Genes on the Human LineagePLoS Computational Biology, 2007
- Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot projectNature, 2007
- Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genomeGenome Research, 2007
- Gene Losses during Human OriginsPLoS Biology, 2006
- Genome-Wide Identification of Human Functional DNA Using a Neutral Indel ModelPLoS Computational Biology, 2006
- Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomesGenome Research, 2005
- Genome sequence of the Brown Norway rat yields insights into mammalian evolutionNature, 2004
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994