Protein structural similarity search by Ramachandran codes

Open Access

23 August 2007

journal article
research article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 8 (1), 307
https://doi.org/10.1186/1471-2105-8-307

Abstract

Background: Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results: We propose a new linear encoding method, SARST (S tructural similarity search A ided by R amachandran S equential T ransformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion: As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

Keywords

This publication has 47 references indexed in Scilit:

Protein structure database search and evolutionary classification
Nucleic Acids Research, 2006
FAST: A novel protein structure alignment algorithm
Proteins-Structure Function and Bioinformatics, 2004
A Hidden Markov Model Derived Structural Alphabet for Proteins
Journal of Molecular Biology, 2004
Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods
Journal of Molecular Biology, 1998
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
AQUA and PROCHECK-NMR: Programs for checking the quality of protein structures solved by NMR
Journal of Biomolecular NMR, 1996
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994
Principles determining the structure of β-sheet barrels in proteins I. A theoretical analysis
Journal of Molecular Biology, 1994
Basic local alignment search tool
Journal of Molecular Biology, 1990
Use of techniques derived from graph theory to compare secondary structure motifs in proteins
Journal of Molecular Biology, 1990

Cited by 42 articles