Statistical analysis and prediction of protein–protein interfaces

Abstract
Predicting protein–protein interfaces from a three‐dimensional structure is a key task of computational structural proteomics. In contrast to geometrically distinct small molecule binding sites, protein–protein interface are notoriously difficult to predict. We generated a large nonredundant data set of 1494 true protein–protein interfaces using biological symmetry annotation where necessary. The data set was carefully analyzed and a Support Vector Machine was trained on a combination of a new robust evolutionary conservation signal with the local surface properties to predict protein–protein interfaces. Fivefold cross validation verifies the high sensitivity and selectivity of the model. As much as 97% of the predicted patches had an overlap with the true interface patch while only 22% of the surface residues were included in an average predicted patch. The model allowed the identification of potential new interfaces and the correction of mislabeled oligomeric states. Proteins 2005.