Prediction of the Amount of Secondary Structure in a Globular Protein from Its Aminoacid Composition

Abstract
Multiple regression is used to obtain relationships for predicting the amount of secondary structure in a protein molecule from a knowledge of its aminoacid composition. We tested these relations using 18 proteins of known structure, but omitting the protein to be predicted. Independent predictions were made for the two subchains of hemoglobin and insulin. The average errors for these 20 chains or subchains are: helix +/- 7.1%, beta-sheet +/- 6.9%, turn +/- 4.2%, and coil +/- 5.7%. A second set of relations yielding somewhat inferior predictions is given for the case in which Asp and Asn, and Glu and Gln, are not differentiated. Predictions are also listed for 15 proteins for which the aminoacid sequence or tertiary structure is unknown.