Genome-Scale Analysis of the Uses of the Escherichia coli Genome: Model-Driven Analysis of Heterogeneous Data Sets

Abstract
The recent availability of heterogeneous high-throughput data types has increased the need for scalable in silico methods with which to integrate data related to the processes of regulation, protein synthesis, and metabolism. A sequence-based framework for modeling transcription and translation in prokaryotes has been established and has been extended to study the expression state of the entire Escherichia coli genome. The resulting in silico analysis of the expression state highlighted three facets of gene expression in E. coli: (i) the metabolic resources required for genome expression and protein synthesis were found to be relatively invariant under the conditions tested; (ii) effective promoter strengths were estimated at the genome scale by using global mRNA abundance and half-life data, revealing genes subject to regulation under the experimental conditions tested; and (iii) large-scale genome location-dependent expression patterns with approximately 600-kb periodicity were detected in the E. coli genome based on the 49 expression data sets analyzed. These results support the notion that a structured model-driven analysis of expression data yields additional information that can be subjected to commonly used statistical analyses. The integration of heterogeneous genome-scale data (i.e., sequence, expression data, and mRNA half-life data) is readily achieved in the context of an in silico model.