High accuracy operon prediction method based on STRING database scores

Open Access

12 April 2010

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 38 (12), e130
https://doi.org/10.1093/nar/gkq254

Abstract

We present a simple and highly accurate computational method for operon prediction, based on intergenic distances and functional relationships between the protein products of contiguous genes, as defined by STRING database (Jensen,L.J., Kuhn,M., Stark,M., Chaffron,S., Creevey,C., Muller,J., Doerks,T., Julien,P., Roth,A., Simonovic,M. et al. (2009) STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. , 37 , D412–D416). These two parameters were used to train a neural network on a subset of experimentally characterized Escherichia coli and Bacillus subtilis operons. Our predictive model was successfully tested on the set of experimentally defined operons in E. coli and B. subtilis , with accuracies of 94.6 and 93.3%, respectively. As far as we know, these are the highest accuracies ever obtained for predicting bacterial operons. Furthermore, in order to evaluate the predictable accuracy of our model when using an organism's data set for the training procedure, and a different organism's data set for testing, we repeated the E. coli operon prediction analysis using a neural network trained with B. subtilis data, and a B. subtilis analysis using a neural network trained with E. coli data. Even for these cases, the accuracies reached with our method were outstandingly high, 91.5 and 93%, respectively. These results show the potential use of our method for accurately predicting the operons of any other organism. Our operon predictions for fully-sequenced genomes are available at http://operons.ibt.unam.mx/OperonPredictor/ .

Keywords

This publication has 32 references indexed in Scilit:

STRING 8--a global view on proteins and their functional interactions in 630 organisms
Nucleic Acids Research, 2008
RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation
Nucleic Acids Research, 2007
Operon prediction using both genome-specific and general genomic information
Nucleic Acids Research, 2006
Operon prediction in Pyrococcus furiosus
Nucleic Acids Research, 2006
Operon prediction based on SVM
Computational Biology and Chemistry, 2006
Detection of operons
Proteins-Structure Function and Bioinformatics, 2006
Operon prediction without a training set
Bioinformatics, 2004
Automatic clustering of orthologs and in-paralogs from pairwise species comparisons
Journal of Molecular Biology, 2001
Fast learning method for back-propagation neural network by evolutionary adaptation of learning rates
Neurocomputing, 1996
Basic local alignment search tool
Journal of Molecular Biology, 1990

Cited by 63 articles