TOPS++FATCAT: Fast flexible structural alignment using constraints derived from TOPS+ Strings Model
Open Access
- 31 August 2008
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 9 (1), 1-12
- https://doi.org/10.1186/1471-2105-9-358
Abstract
Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses. We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions. The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: http://fatcat.burnham.org/TOPS/ TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.Keywords
This publication has 31 references indexed in Scilit:
- Protein structure topological comparison, discovery and matching serviceBioinformatics, 2005
- Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensionsActa Crystallographica Section D-Biological Crystallography, 2004
- MSATApplied Bioinformatics, 2004
- Algorithms on Strings, Trees, and Sequences: Computer Science and Computational BiologyJournal of the American Statistical Association, 1999
- Protein structural topology: Automated analysis and diagrammatic representationProtein Science, 1999
- CATH – a hierarchic classification of protein domain structuresStructure, 1997
- The use of the area under the ROC curve in the evaluation of machine learning algorithmsPattern Recognition, 1997
- Use of techniques derived from graph theory to compare secondary structure motifs in proteinsJournal of Molecular Biology, 1990
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983
- Structural patterns in globular proteinsNature, 1976