Prediction of RNA binding sites in proteins from amino acid sequence

21 June 2006

journal article
research article
Published by Cold Spring Harbor Laboratory in RNA

Vol. 12 (8), 1450-1462
https://doi.org/10.1261/rna.2197306

Abstract

RNA–protein interactions are vitally important in a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed a computational tool for predicting which amino acids of an RNA binding protein participate in RNA–protein interactions, using only the protein sequence as input. RNABindR was developed using machine learning on a validated nonredundant data set of interfaces from known RNA–protein complexes in the Protein Data Bank. It generates a classifier that captures primary sequence signals sufficient for predicting which amino acids in a given protein are located in the RNA–protein interface. In leave-one-out cross-validation experiments, RNABindR identifies interface residues with >85% overall accuracy. It can be calibrated by the user to obtain either high specificity or high sensitivity for interface residues. RNABindR, implementing a Naive Bayes classifier, performs as well as a more complex neural network classifier (to our knowledge, the only previously published sequence-based method for RNA binding site prediction) and offers the advantages of speed, simplicity and interpretability of results. RNABindR predictions on the human telomerase protein hTERT are in good agreement with experimental data. The availability of computational tools for predicting which residues in an RNA binding protein are likely to contact RNA should facilitate design of experiments to directly test RNA binding function and contribute to our understanding of the diversity, mechanisms, and regulation of RNA–protein complexes in biological systems. (RNABindR is available as a Web tool from http://bindr.gdcb.iastate.edu.)

Keywords

This publication has 55 references indexed in Scilit:

An algorithm for predicting protein–protein interaction sites: Abnormally exposed amino acid residues and secondary structure elements
Protein Science, 2006
Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines
Journal of Theoretical Biology, 2005
A two-stage classifier for identification of protein–protein interface residues
Bioinformatics, 2004
Identification of interface residues in protease-inhibitor and antigen-antibody complexes: a support vector machine approach
Neural Computing & Applications, 2004
ProMate: A Structure Based Prediction Program to Identify the Location of Protein–Protein Binding Sites
Journal of Molecular Biology, 2004
Automatic prediction of protein function
Cellular and Molecular Life Sciences, 2003
The kink-turn: a new RNA secondary structure motif
The EMBO Journal, 2001
RNA Binding Domain of Telomerase Reverse Transcriptase
Molecular and Cellular Biology, 2001
The Protein Data Bank
Nucleic Acids Research, 2000
RNA–protein complexes
Current Opinion in Structural Biology, 1999

Cited by 146 articles