BLAT—The BLAST-Like Alignment Tool
Top Cited Papers
Open Access
- 20 March 2002
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 12 (4), 656-664
- https://doi.org/10.1101/gr.229202
Abstract
Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments. A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. BLAT's speed stems from an index of all nonoverlapping K-mers in the genome. This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly. BLAT has several major stages. It uses the index to find regions in the genome likely to be homologous to the query sequence. It performs an alignment between homologous regions. It stitches together these aligned regions (often exons) into larger alignments (typically genes). Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible. This paper describes how BLAT was optimized. Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches. BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications. http://genome.ucsc.edu hosts a web-basedBLAT server for the human genome.Keywords
This publication has 21 references indexed in Scilit:
- SSAHA: A Fast Search Method for Large DNA DatabasesGenome Research, 2001
- SGP-1: Prediction and Validation of Homologous Genes Based on Sequence AlignmentsGenome Research, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- A Greedy Algorithm for Aligning DNA SequencesJournal of Computational Biology, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Identification of protein coding regions by database similarity searchNature Genetics, 1993
- Aligning two sequences within a specified diagonal bandBioinformatics, 1992
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Optimal sequence alignment allowing for long gapsBulletin of Mathematical Biology, 1990
- Identification of common molecular subsequencesJournal of Molecular Biology, 1981