Predicting surface exposure of amino acids from protein sequence

Abstract
The amino acid residues on a protein surface play a key role in interaction with other molecules, determine many physical properties, and constrain the structure of the folded protein. A database of monomeric protein crystal structures was used to teach computer-simulated neural networks rules for predicting surface exposure from local sequence. These trained networks are able to correctly predict surface exposure for 72% of residues in a testing set using a binary model (buried/exposed) and for 54% of residues using a ternary model (buried/intermediate/exposed). In the ternary model, only 11% of the exposed residues are predicted as buried and only 5% of the buried residues are predicted as exposed. Also, since the networks are able to predict exposure with a quantitative confidence estimate, it is possible to assign exposure for over half of the residues in a binary model with >80% accuracy. Even more accurate predictions are obtained by making a consensus prediction of exposure for a homologous family. The effect of the local environment of an amino acid on its accessibility, though smaller than expected, is significant and accounts for the higher success rate of prediction than obtained with previously used criteria. In the absence of a three-dimensional structure, the ability to predict surface accessibility of amino acids directly from the sequence is a valuable tool in choosing sites of chemical modification or specific mutations and in studies of molecular interaction.