The protein structure prediction problem could be solved using the current PDB library
- 14 January 2005
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 102 (4), 1029-1034
- https://doi.org/10.1073/pnas.0407152101
Abstract
For single-domain proteins, we examine the completeness of the structures in the current Protein Data Bank (PDB) library for use in full-length model construction of unknown sequences. To address this issue, we employ a comprehensive benchmark set of 1,489 medium-size proteins that cover the PDB at the level of 35% sequence identity and identify templates by structure alignment. With homologous proteins excluded, we can always find similar folds to native with an average rms deviation (RMSD) from native of 2.5 A with approximately 82% alignment coverage. These template structures often contain a significant number of insertions/deletions. The tasser algorithm was applied to build full-length models, where continuous fragments are excised from the top-scoring templates and reassembled under the guide of an optimized force field, which includes consensus restraints taken from the templates and knowledge-based statistical potentials. For almost all targets (except for 2/1,489), the resultant full-length models have an RMSD to native below 6 A (97% of them below 4 A). On average, the RMSD of full-length models is 2.25 A, with aligned regions improved from 2.5 A to 1.88 A, comparable with the accuracy of low-resolution experimental structures. Furthermore, starting from state-of-the-art structural alignments, we demonstrate a methodology that can consistently bring template-based alignments closer to native. These results are highly suggestive that the protein-folding problem can in principle be solved based on the current PDB library by developing efficient fold recognition algorithms that can recover such initial alignments.Keywords
This publication has 48 references indexed in Scilit:
- The PDB is a Covering Set of Small Protein StructuresJournal of Molecular Biology, 2003
- An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distanceJournal of Molecular Biology, 2000
- Modeling of loops in protein structuresProtein Science, 2000
- The Protein Data BankNucleic Acids Research, 2000
- Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von HeijneJournal of Molecular Biology, 1999
- GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequencesJournal of Molecular Biology, 1999
- CATH – a hierarchic classification of protein domain structuresStructure, 1997
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresJournal of Molecular Biology, 1995
- Comparative Protein Modelling by Satisfaction of Spatial RestraintsJournal of Molecular Biology, 1993
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970