An algorithm for predicting protein–protein interaction sites: Abnormally exposed amino acid residues and secondary structure elements

Abstract
Multiprotein systems mediate most regulatory processes in living organisms. Although the structures of the individual proteins are often defined, less is known of the structures of multiprotein systems. Computational methods for predicting interfaces, using evolutionary conservation and/or physicochemical data, have been developed. Here we consider the use of solvent accessibility, residue propensity, and hydrophobicity, in conjunction with secondary structure data, as prediction parameters. We analyze the influence of residue type and secondary structure on solvent accessibility and define a measure of “relative exposedness.” Clustering abnormally high scoring residues provides a basis for predicting interaction sites. The analysis is extended to investigate abnormally exposed secondary structure elements, particularly β‐sheet strands. We show that surface‐exposed β‐strands lacking protective features are more likely to be found at protein–protein interfaces, allowing us to create an algorithm with ∼68% and ∼75% accuracy in differentiating between interacting and edge strands in isolated β‐strands and β‐sheet strands, respectively. These methods of identifying abnormally exposed surface regions are combined in an algorithm, which, on a data set of 77 unbound and disjoint (single chain extracted from complex) structures, predicts 79% of the protein–protein interfaces correctly. If enzyme–inhibitor complexes, where the inhibitor mimics a nonprotein substrate, are excluded, the accuracy increases to 85%.