Adaptive seeds tame genomic sequence comparison

Top Cited Papers

Open Access

5 January 2011

journal article
Published by Cold Spring Harbor Laboratory in Genome Research

Vol. 21 (3), 487-493
https://doi.org/10.1101/gr.113985.110

Abstract

The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.

Keywords

This publication has 29 references indexed in Scilit:

A survey of sequence alignment algorithms for next-generation sequencing
Briefings in Bioinformatics, 2010
Parameters for accurate genome alignment
BMC Bioinformatics, 2010
Incorporating sequence quality data into alignment improves DNA read mapping
Nucleic Acids Research, 2010
Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content
Nature, 2010
How to map billions of short reads onto genomes
Nature Biotechnology, 2009
Database indexing for production MegaBLAST searches
Bioinformatics, 2008
A taxonomy of suffix array construction algorithms
ACM Computing Surveys, 2007
Alu repeats and human genomic diversity
Nature Reviews Genetics, 2002
BLAT—The BLAST-Like Alignment Tool
Genome Research, 2002
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997

Cited by 1153 articles