Protein sequence similarity searches using patterns as seeds
- 1 September 1998
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 26 (17), 3986-3990
- https://doi.org/10.1093/nar/26.17.3986
Abstract
Protein families often are characterized by conserved sequence patterns or motifs. A researcher frequently wishes to evaluate the significance of a specific pattern within a protein, or to exploit knowledge of known motifs to aid the recognition of greatly diverged but homologous family members. To assist in these efforts, the pattern-hit initiated BLAST (PHI-BLAST) program described here takes as input both a protein sequence and a pattern of interest that it contains. PHI-BLAST searches a protein database for other instances of the input pattern, and uses those found as seeds for the construction of local alignments to the query sequence. The random distribution of PHI-BLAST alignment scores is studied analytically and empirically. In many instances, the program is able to detect statistically significant similarity between homologous proteins that are not recognizably related using traditional single-pass database search methods. PHI-BLAST is applied to the analysis of CED4-like cell death regulators, HS90-type ATPase domains, archaeal tRNA nucleotidyltransferases and archaeal homologs of DnaG-type DNA primases.Keywords
This publication has 44 references indexed in Scilit:
- Empirical statistical estimates for sequence similarity searchesJournal of Molecular Biology, 1998
- Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaeaMolecular Microbiology, 1997
- [27] Local alignment statisticsMethods in enzymology, 1996
- A system for pattern matching applications on biosequencesBioinformatics, 1993
- Nonconserved segment of the MutL protein fromEscherichia coliK-12 andSalmonella typhimuriumNucleic Acids Research, 1992
- Methods for calculating the probabilities of finding patterns in sequencesBioinformatics, 1989
- Improved tools for biological sequence comparison.Proceedings of the National Academy of Sciences, 1988
- The significance of protein sequence similaritiesBioinformatics, 1988
- An improved algorithm for matching biological sequencesJournal of Molecular Biology, 1982