PREDICTING THE OPERON STRUCTURE OF BACILLUS SUBTILIS USING OPERON LENGTH, INTERGENE DISTANCE, AND GENE EXPRESSION INFORMATION

Abstract
We predict the operon structure of the Bacillus subtilis genome using the average operon length, the distance between genes in base pairs, and the similarity in gene expression measured in time course and gene disruptant experiments. By expressing the operon prediction for each method as a Bayesian probability, we are able to combine the four prediction methods into a Bayesian classifier in a statistically rigorous manner. The discriminant value for the Bayesian classifier can be chosen by considering the associated cost of misclassifying an operon or a non-operon gene pair. For equal costs, an overall accuracy of 88.7% was found in a leave-one-out analysis for the joint Bayesian classifier, whereas the individual information sources yielded accuracies of 58.1%, 83.1%, 77.3%, and 71.8% respectively. The predicted operon structure based on the joint Bayesian classifier is available from the DBTBS database (http://dbtbs.hgc.jp).