Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs
- 10 October 2008
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 18 (11), 1814-1828
- https://doi.org/10.1101/gr.076554.108
Abstract
Pairwise whole-genome alignment involves the creation of a homology map, capable of performing a near complete transformation of one genome into another. For multiple genomes this problem is generalized to finding a set of consistent homology maps for converting each genome in the set of aligned genomes into any of the others. The problem can be divided into two principal stages. First, the partitioning of the input genomes into a set of colinear segments, a process which essentially deals with the complex processes of rearrangement. Second, the generation of a base pair level alignment map for each colinear segment. We have developed a new genome-wide segmentation program, Enredo, which produces colinear segments from extant genomes handling rearrangements, including duplications. We have then applied the new alignment program Pecan, which makes the consistency alignment methodology practical at a large scale, to create a new set of genome-wide mammalian alignments. We test both Enredo and Pecan using novel and existing assessment analyses that incorporate both real biological data and simulations, and show that both independently and in combination they outperform existing programs. Alignments from our pipeline are publicly available within the Ensembl genome browser.Keywords
This publication has 50 references indexed in Scilit:
- Genome-wide nucleotide-level mammalian ancestor reconstructionGenome Research, 2008
- The infinite sites model of genome evolutionProceedings of the National Academy of Sciences, 2008
- Velvet: Algorithms for de novo short read assembly using de Bruijn graphsGenome Research, 2008
- Ensembl 2008Nucleic Acids Research, 2007
- Uncertainty in homology inferences: Assessing and improving genomic sequence alignmentGenome Research, 2007
- Reconstructing contiguous regions of an ancestral genomeGenome Research, 2006
- De novo identification of repeat families in large genomesBioinformatics, 2005
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990