Adaptive seeds tame genomic sequence comparison
Top Cited Papers
Open Access
- 5 January 2011
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 21 (3), 487-493
- https://doi.org/10.1101/gr.113985.110
Abstract
The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.Keywords
This publication has 29 references indexed in Scilit:
- A survey of sequence alignment algorithms for next-generation sequencingBriefings in Bioinformatics, 2010
- Parameters for accurate genome alignmentBMC Bioinformatics, 2010
- Incorporating sequence quality data into alignment improves DNA read mappingNucleic Acids Research, 2010
- Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene contentNature, 2010
- How to map billions of short reads onto genomesNature Biotechnology, 2009
- Database indexing for production MegaBLAST searchesBioinformatics, 2008
- A taxonomy of suffix array construction algorithmsACM Computing Surveys, 2007
- Alu repeats and human genomic diversityNature Reviews Genetics, 2002
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997