Prediction of protein folding class from amino acid composition

Abstract
An empirical relation between the amino acid composition and three-dimensional folding pattern of several classes of proteins has been determined. Computer simulated neural networks have been used to assign proteins to one of the following classes based on their amino acid composition and size: (1) 4α-helical bundles, (2) parallel (α/β)8 barrels, (3) nucleotide binding fold, (4) immunoglobulin fold, or (5) none of these. Networks trained on the known crystal structures as well as sequences of closely related proteins are shown to correctly predict folding classes of proteins not represented in the training set with an average accuracy of 87%. Other folding motifs can easily be added to the prediction scheme once larger databases become available. Analysis of the neural network weights reveals that amino acids favoring prediction of a folding class are usually over represented in that class and amino acids with unfavorable weights are underrepresented in composition. The neural networks utilize combinations of these multiple small variations in amino acid composition in order to make a prediction. The favorably weighted amino acids in a given class also form the most intramolecular interactions with other residues in proteins of that class. A detailed examination of the contacts of these amino acids reveals some general patterns that may help stabilize each folding class.