Identification of protein coding regions by database similarity search
- 1 March 1993
- journal article
- research article
- Published by Springer Nature in Nature Genetics
- Vol. 3 (3), 266-272
- https://doi.org/10.1038/ng0393-266
Abstract
Sequence similarity between a translated nucleotide sequence and a known biological protein can provide strong evidence for the presence of a homologous coding region, even between distantly related genes. The computer program BLASTX performed conceptual translation of a nucleotide query sequence followed by a protein database search in one programmatic step. We characterized the sensitivity of BLASTX recognition to the presence of substitution, insertion and deletion errors in the query sequence and to sequence divergence. Reading frames were reliably identified in the presence of 1% query errors, a rate that is typical for primary sequence data. BLASTX is appropriate for use in moderate and large scale sequencing projects at the earliest opportunity, when the data are most prone to containing errors.Keywords
This publication has 34 references indexed in Scilit:
- Statistics of local complexity in amino acid sequences and sequence databasesComputers & Chemistry, 1993
- PATMAT: a searching and extraction program for sequence, pattern and block queries and databasesBioinformatics, 1992
- Identifying coding exons by similarity search: Alu-derived and other potentially misleading protein sequencesGenomics, 1992
- Sequence identification of 2,375 human brain genesNature, 1992
- Amino acid substitution matrices from an information theoretic perspectiveJournal of Molecular Biology, 1991
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Identification of the Cystic Fibrosis Gene: Cloning and Characterization of Complementary DNAScience, 1989
- Identification of common molecular subsequencesJournal of Molecular Biology, 1981
- Partial nucleotide sequence of the 300-nucleotide interspersed repeated human DNA sequencesNature, 1980