Protein structural similarity search by Ramachandran codes
Open Access
- 23 August 2007
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 8 (1), 307
- https://doi.org/10.1186/1471-2105-8-307
Abstract
Background: Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results: We propose a new linear encoding method, SARST (S tructural similarity search A ided by R amachandran S equential T ransformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion: As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.Keywords
This publication has 47 references indexed in Scilit:
- Protein structure database search and evolutionary classificationNucleic Acids Research, 2006
- FAST: A novel protein structure alignment algorithmProteins-Structure Function and Bioinformatics, 2004
- A Hidden Markov Model Derived Structural Alphabet for ProteinsJournal of Molecular Biology, 2004
- Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methodsJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- AQUA and PROCHECK-NMR: Programs for checking the quality of protein structures solved by NMRJournal of Biomolecular NMR, 1996
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Principles determining the structure of β-sheet barrels in proteins I. A theoretical analysisJournal of Molecular Biology, 1994
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Use of techniques derived from graph theory to compare secondary structure motifs in proteinsJournal of Molecular Biology, 1990