Abstract
The number of amino acid residues in contact with a residue in a globular protein is a simple and good measure to show the relative location of the residue on the surface or in the interior of the protein. The contact number is estimated as the number of C.alpha. atoms within a sphere of radius r (8 .ANG.) centered at the C.alpha. atom of a given residue. The prediction of a diagram (the plot of the contact number against the residue number) from a given amino acid sequence may be meaningful as an alternative to the secondary-structure prediction currently performed. Parameter values are determined empirically using the observed contact numbers calculated from known structures of 39 proteins. To assess the real efficiency of the method, the prediction was performed in the following way: all the proteins are divided into 2 groups; 1 group is used to derive parameter sets and the other serves to test the prediction accuracy. The test reveals that the parameter sets empirically determined are biased significantly towards the data base, the extent of which is roughly proportional to the number of parameter terms included. An adequate smoothing of a parameter set is the best way to reduce the extent of biasing towards the data base and to give the best prediction for unknown proteins. The prediction accuracy finally obtained is about 0.4 (or roughly 70%), on the average, measured by the correlation coefficient between the predicted and observed diagrams. This value is of the same order as the accuracy in the current predictions of secondary structures.