Multiple whole-genome alignments without a reference organism

28 January 2009

journal article
Published by Cold Spring Harbor Laboratory in Genome Research

Vol. 19 (4), 682-689
https://doi.org/10.1101/gr.081778.108

Abstract

Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and six Drosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families—perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

Keywords

This publication has 38 references indexed in Scilit:

Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs
Genome Research, 2008
Uncertainty in homology inferences: Assessing and improving genomic sequence alignment
Genome Research, 2007
Human GLI3 Intragenic Conserved Non-Coding Sequences Are Tissue-Specific Enhancers
PLOS ONE, 2007
In vivo enhancer analysis of human conserved non-coding sequences
Nature, 2006
Reconstructing contiguous regions of an ancestral genome
Genome Research, 2006
Close sequence comparisons are sufficient to identify human cis-regulatory elements
Genome Research, 2006
Using Multiple Alignments to Improve Gene Prediction
Journal of Computational Biology, 2006
LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA
Genome Research, 2003
BLAT—The BLAST-Like Alignment Tool
Genome Research, 2002
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994

Cited by 78 articles