Statistical potentials for fold assessment

Top Cited Papers

Open Access

1 February 2002

journal article
research article
Published by Wiley in Protein Science

Vol. 11 (2), 430-448
https://doi.org/10.1002/pro.110430

Abstract

A protein structure model generally needs to be evaluated to assess whether or not it has the correct fold. To improve fold assessment, four types of a residue‐level statistical potential were optimized, including distance‐dependent, contact, ϕ/Ψ dihedral angle, and accessible surface statistical potentials. Approximately 10,000 test models with the correct and incorrect folds were built by automated comparative modeling of protein sequences of known structure. The criterion used to discriminate between the correct and incorrect models was the Z‐score of the model energy. The performance of a Z‐score was determined as a function of many variables in the derivation and use of the corresponding statistical potential. The performance was measured by the fractions of the correctly and incorrectly assessed test models. The most discriminating combination of any one of the four tested potentials is the sum of the normalized distance‐dependent and accessible surface potentials. The distance‐dependent potential that is optimal for assessing models of all sizes uses both C_α and C_β atoms as interaction centers, distinguishes between all 20 standard residue types, has the distance range of 30 Å, and is derived and used by taking into account the sequence separation of the interacting atom pairs. The terms for the sequentially local interactions are significantly less informative than those for the sequentially nonlocal interactions. The accessible surface potential that is optimal for assessing models of all sizes uses C_β atoms as interaction centers and distinguishes between all 20 standard residue types. The performance of the tested statistical potentials is not likely to improve significantly with an increase in the number of known protein structures used in their derivation. The parameters of fold assessment whose optimal values vary significantly with model size include the size of the known protein structures used to derive the potential and the distance range of the accessible surface potential. Fold assessment by statistical potentials is most difficult for the very small models. This difficulty presents a challenge to fold assessment in large‐scale comparative modeling, which produces many small and incomplete models. The results described in this study provide a basis for an optimal use of statistical potentials in fold assessment.

Keywords

This publication has 121 references indexed in Scilit:

Ab initio construction of protein tertiary structures using a hierarchical approach
Journal of Molecular Biology, 2000
Modeling of loops in protein structures
Protein Science, 2000
The Protein Data Bank
Nucleic Acids Research, 2000
Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemes
Protein Science, 1999
CATH – a hierarchic classification of protein domain structures
Structure, 1997
A new approach to protein fold recognition
Nature, 1992
Assessment of protein models with three-dimensional profiles
Nature, 1992
Protein folding: Effect of packing density on chain conformation
Journal of Molecular Biology, 1991
Knowledge-based prediction of protein structures and the design of novel molecules
Nature, 1987
Solvation energy in protein folding and binding
Nature, 1986

Cited by 325 articles