Ultra-Fast Evaluation of Protein Energies Directly from Sequence
Open Access
- 16 June 2006
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 2 (6), e63
- https://doi.org/10.1371/journal.pcbi.0020063
Abstract
The structure, function, stability, and many other properties of a protein in a fixed environment are fully specified by its sequence, but in a manner that is difficult to discern. We present a general approach for rapidly mapping sequences directly to their energies on a pre-specified rigid backbone, an important sub-problem in computational protein design and in some methods for protein structure prediction. The cluster expansion (CE) method that we employ can, in principle, be extended to model any computable or measurable protein property directly as a function of sequence. Here we show how CE can be applied to the problem of computational protein design, and use it to derive excellent approximations of physical potentials. The approach provides several attractive advantages. First, following a one-time derivation of a CE expansion, the amount of time necessary to evaluate the energy of a sequence adopting a specified backbone conformation is reduced by a factor of 107 compared to standard full-atom methods for the same task. Second, the agreement between two full-atom methods that we tested and their CE sequence-based expressions is very high (root mean square deviation 1.1–4.7 kcal/mol, R2 = 0.7–1.0). Third, the functional form of the CE energy expression is such that individual terms of the expansion have clear physical interpretations. We derived expressions for the energies of three classic protein design targets—a coiled coil, a zinc finger, and a WW domain—as functions of sequence, and examined the most significant terms. Single-residue and residue-pair interactions are sufficient to accurately capture the energetics of the dimeric coiled coil, whereas higher-order contributions are important for the two more globular folds. For the task of designing novel zinc-finger sequences, a CE-derived energy function provides significantly better solutions than a standard design protocol, in comparable computation time. Given these advantages, CE is likely to find many uses in computational structural modeling. Many applications in computational structural biology involve evaluating the energy of a protein adopting a specific structure. A variety of functions are used for this purpose. Statistical potentials are fast to evaluate but do not have a clear biophysical basis, whereas physics-based functions consist of well-defined terms that can be costly to compute. This paper describes how the theory of cluster expansion, originally developed to describe the energies of alloys, can be applied to generate a physical potential for proteins that is extremely fast to evaluate. Cluster expansion is a way of representing a property of a system as a discrete function of its degrees of freedom. In this paper, it is used for the problem of protein design, where the energy is determined by the identities and conformations of amino acids at different sites on a fixed protein backbone. Application of cluster expansion to three small protein folds—the α-helical coiled coil, the zinc finger, and the WW domain—shows that protein sequence can be mapped directly to energy using a surprisingly simple function that maintains high accuracy. Promising results on these small systems suggest that the theory may have utility for macromolecular modeling more generally.Keywords
This publication has 61 references indexed in Scilit:
- Structure-based Prediction of bZIP Partnering SpecificityJournal of Molecular Biology, 2006
- Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutationsJournal of Molecular Biology, 2001
- SOCKET: a program for identifying and analysing coiled-coil motifs within protein structures11Edited by J. ThorntonJournal of Molecular Biology, 2001
- Conformational splitting: A more powerful criterion for dead-end eliminationJournal of Computational Chemistry, 2000
- De novo protein design. I. in search of stability and specificityJournal of Molecular Biology, 1999
- Inter-helical interactions in the leucine zipper coiled coil dimer: ph and salt dependence of coupling energy between charged amino acidsJournal of Molecular Biology, 1998
- MultiCoil: A program for predicting two‐and three‐stranded coiled coilsProtein Science, 1997
- How to Derive a Protein Folding Potential? A New Approach to an Old ProblemJournal of Molecular Biology, 1996
- Predicting Coiled Coils from Protein SequencesScience, 1991
- CHARMM: A program for macromolecular energy, minimization, and dynamics calculationsJournal of Computational Chemistry, 1983