Statistical analysis and prediction of protein–protein interfaces

19 May 2005

journal article
research article
Published by Wiley in Proteins-Structure Function and Bioinformatics

Vol. 60 (3), 353-366
https://doi.org/10.1002/prot.20433

Abstract

Predicting protein–protein interfaces from a three‐dimensional structure is a key task of computational structural proteomics. In contrast to geometrically distinct small molecule binding sites, protein–protein interface are notoriously difficult to predict. We generated a large nonredundant data set of 1494 true protein–protein interfaces using biological symmetry annotation where necessary. The data set was carefully analyzed and a Support Vector Machine was trained on a combination of a new robust evolutionary conservation signal with the local surface properties to predict protein–protein interfaces. Fivefold cross validation verifies the high sensitivity and selectivity of the model. As much as 97% of the predicted patches had an overlap with the true interface patch while only 22% of the surface residues were included in an average predicted patch. The model allowed the identification of potential new interfaces and the correction of mislabeled oligomeric states. Proteins 2005.

Keywords

This publication has 58 references indexed in Scilit:

UniProt: the Universal Protein knowledgebase
Nucleic Acids Research, 2004
Structural Characterisation and Functional Significance of Transient Protein–Protein Interactions
Journal of Molecular Biology, 2003
The Protein Data Bank
Nucleic Acids Research, 2000
Prediction of protein-protein interaction sites using patch analysis 1 1Edited by G. von Heijne
Journal of Molecular Biology, 1997
Crystal Structure of Abrin-a at 2.14 Å
Journal of Molecular Biology, 1995
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994
Shape Complementarity at Protein/Protein Interfaces
Journal of Molecular Biology, 1993
The rapid generation of mutation data matrices from protein sequences
Bioinformatics, 1992
Basic local alignment search tool
Journal of Molecular Biology, 1990
Evolutionary trees from DNA sequences: A maximum likelihood approach
Journal of Molecular Evolution, 1981

Cited by 146 articles