Predicting small ligand binding sites in proteins using backbone structure
Open Access
- 21 October 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (24), 2865-2871
- https://doi.org/10.1093/bioinformatics/btn543
Abstract
Motivation: Specific non-covalent binding of metal ions and ligands, such as nucleotides and cofactors, is essential for the function of many proteins. Computational methods are useful for predicting the location of such binding sites when experimental information is lacking. Methods that use structural information, when available, are particularly promising since they can potentially identify non-contiguous binding motifs that cannot be found using only the amino acid sequence. Furthermore, a prediction method that can utilize low-resolution models is advantageous because high-resolution structures are available for only a relatively small fraction of proteins. Results: SitePredict is a machine learning-based method for predicting binding sites in protein structures for specific metal ions or small molecules. The method uses Random Forest classifiers trained on diverse residue-based site properties including spatial clustering of residue types and evolutionary conservation. SitePredict was tested by cross-validation on a set of known binding sites for six different metal ions and five different small molecules in a non-redundant set of protein–ligand complex structures. The prediction performance was good for all ligands considered, as reflected by AUC values of at least 0.8. Furthermore, a more realistic test on unbound structures showed only a slight decrease in the accuracy. The properties that contribute the most to the prediction accuracy of each ligand were also examined. Finally, examples of predicted binding sites in homology models and uncharacterized proteins are discussed. Availability: Binding site prediction results for all PDB protein structures and human protein homology models are available at http://sitepredict.org/. Contact: bordner.andrew@mayo.edu Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 42 references indexed in Scilit:
- Contributions to the NIH-NIGMS Protein Structure Initiative from the PSI Production CentersStructure, 2008
- A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotationProceedings of the National Academy of Sciences, 2008
- Robust recognition of zinc binding sites in proteinsProtein Science, 2008
- Automatic generation of 3D motifs for classification of protein binding sitesBMC Bioinformatics, 2007
- The Impact of Structural Genomics: Expectations and OutcomesScience, 2006
- Pfam: clans, web tools and servicesNucleic Acids Research, 2006
- Predicting Metal-binding Site Residues in Low-resolution Structural ModelsJournal of Molecular Biology, 2004
- Statistical potentials for fold assessmentProtein Science, 2002
- Comparative Protein Modelling by Satisfaction of Spatial RestraintsJournal of Molecular Biology, 1993
- Structural consequences of sequence patterns in the fingerprint region of the nucleotide binding fold: Implications for nucleotide specificityJournal of Molecular Biology, 1992