Expression-Guided In Silico Evaluation of Candidate Cis Regulatory Codes for Drosophila Muscle Founder Cells

Abstract
While combinatorial models of transcriptional regulation can be inferred for metazoan systems from a priori biological knowledge, validation requires extensive and time-consuming experimental work. Thus, there is a need for computational methods that can evaluate hypothesized cis regulatory codes before the difficult task of experimental verification is undertaken. We have developed a novel computational framework (termed “CodeFinder”) that integrates transcription factor binding site and gene expression information to evaluate whether a hypothesized transcriptional regulatory model (TRM; i.e., a set of co-regulating transcription factors) is likely to target a given set of co-expressed genes. Our basic approach is to simultaneously predict cis regulatory modules (CRMs) associated with a given gene set and quantify the enrichment for combinatorial subsets of transcription factor binding site motifs comprising the hypothesized TRM within these predicted CRMs. As a model system, we have examined a TRM experimentally demonstrated to drive the expression of two genes in a sub-population of cells in the developing Drosophila mesoderm, the somatic muscle founder cells. This TRM was previously hypothesized to be a general mode of regulation for genes expressed in this cell population. In contrast, the present analyses suggest that a modified form of this cis regulatory code applies to only a subset of founder cell genes, those whose gene expression responds to specific genetic perturbations in a similar manner to the gene on which the original model was based. We have confirmed this hypothesis by experimentally discovering six (out of 12 tested) new CRMs driving expression in the embryonic mesoderm, four of which drive expression in founder cells. Although genome sequences and much gene expression data are readily available, the determination of sets of transcription factors regulating particular gene expression patterns remains a problem of fundamental importance. Tissue-specific gene expression in developing animals is regulated through the combinatorial interactions of transcription factors with DNA regulatory elements termed cis regulatory modules (CRMs). Although genetic and biochemical experiments allow the identification of transcription factors and CRMs, those experiments are laborious and time-consuming. Philippakis et al. introduce a new approach (termed “CodeFinder”) for quantifying the enrichment for particular combinations of transcription factor binding site motifs within predicted CRMs associated with a given gene set of interest, identified from gene expression data. The authors' analyses allowed them to discover a specific combination of transcription factor binding site motifs that constitute a core cis regulatory code for expression of a particular subset of genes in muscle founder cells, an embryonic cell population in the developing fruit fly (Drosophila melanogaster) mesoderm, and also led them to the discovery and subsequent experimental validation of novel, tissue-specific CRMs. Importantly, the CodeFinder approach is generally applicable, and thus could be used to support, refute, or refine a known or hypothesized cis regulatory code for any biological system or genome of interest.