Discovering structural cis ‐regulatory elements by modeling the behaviors of mRNAs

Abstract
Gene expression is regulated at each step from chromatin remodeling through translation and degradation. Several known RNA‐binding regulatory proteins interact with specific RNA secondary structures in addition to specific nucleotides. To provide a more comprehensive understanding of the regulation of gene expression, we developed an integrative computational approach that leverages functional genomics data and nucleotide sequences to discover RNA secondary structure‐defined cis ‐regulatory elements (SCREs). We applied our structural cis ‐regulatory element detector (StructRED) to microarray and mRNA sequence data from Saccharomyces cerevisiae , Drosophila melanogaster , and Homo sapiens . We recovered the known specificities of Vts1p in yeast and Smaug in flies. In addition, we discovered six putative SCREs in flies and three in humans. We characterized the SCREs based on their condition‐specific regulatory influences, the annotation of the transcripts that contain them, and their locations within transcripts. Overall, we show that modeling functional genomics data in terms of combined RNA structure and sequence motifs is an effective method for discovering the specificities and regulatory roles of RNA‐binding proteins. ### Synopsis Gene expression is regulated at each step from chromatin remodeling through translation and degradation. Yet, most efforts to understand the regulation of gene expression have been focused on transcription and DNA‐binding regulatory proteins. Although regulatory RNAs have received appreciable attention ([Bushati and Cohen, 2007][1]; [Coppins et al , 2007][2]), regulatory elements within mRNAs that are recognized by nucleic acid‐binding proteins have been largely ignored until recently ([Keene, 2007][3]). This state exists despite observations that suggest changes in mRNA stability may account for half of the changes in mRNA expression in some cells and conditions ([Fan et al , 2002][4]; [Cheadle et al , 2005][5]). Moreover, it is a mathematical certainty that mRNAs of average stability can only be rapidly downregulated by altering the mRNA decay rate (see [Pérez‐Ortín et al , 2007][6] for derivation). Thus, one way to execute rapid, large‐scale gene expression responses to unpredictable environmental stimuli is through decay‐regulating RNA‐binding proteins (RBPs), whose activity can be rapidly modulated post‐transcriptionally. Early metazoan embryogenesis also requires mRNA stability and translation regulation to orchestrate the activities of maternally deposited transcripts (for review see [Vardy and Orr‐Weaver, 2007][7]). Despite the potential importance of RNA secondary structures as binding sites for regulatory RBPs, computational methods for their discovery have failed to keep pace with current functional genomics technology (e.g. microarrays). Now, well into the era of functional genomics, RNA structure finding algorithms are still sequence‐only methods, having so far failed to use the data‐integrative approaches that are becoming increasingly common for the discovery of DNA‐binding protein specificities ([Bussemaker et al , 2001][8], [2007][9]; [Foat et al , 2005][10], [2006][11]). In this work, we present a novel, alignment‐free method that discovers secondary structure‐defined cis ‐regulatory elements (SCREs) in mRNAs by modeling the effects that their occurrences exert on quantitative measurements of mRNA behavior in the form of microarray data. This process is embodied in a regression‐based algorithm called structural cis ‐regulatory element detector (StructRED). The method defines a RNA structure search space, which is small stem–loop structures in this version, and then exhaustively scores all short nucleotide sequences within the structural context for how well their occurrences explain observed microarray measurements. Through an iterative process, a multivariate model consisting of SCRE‐derived mRNA sequence scores is developed to explain the input microarray data. The output of the method is a list of putative SCRE weight matrices and the inferred post‐translational regulatory activities of the unknown trans‐ factors across all of the input microarray conditions ( trans ‐factor activity profiles, TFAPs). We accurately recover the known stem–loop binding specificities of the RNA‐binding proteins Smaug in Drosophila and Vts1p in S. cerevisiae using mRNA sequences and microarray data. When we inspected the computationally inferred behavior of the Smaug protein across several microarray experiments profiling mRNA levels and translation in developing Drosophila embryos ([Pilot et al , 2006][12]; [Qin et al , 2007][13]; [Tadros et al , 2007][14]; GEO accessions GSE8910, GSE3955, GSE5430), Smaug represses translation of its target mRNAs during the first 2 h of embryogenesis and promotes the degradation of its target transcripts starting at about 2 h of development. Our genome‐wide inferences are consistent with the detailed observations of Smaug destabilizing Hsp83 mRNAs ([Semotok et al , 2005][15]) and translationally repressing nanos mRNAs ([Dahanukar et al , 1999][16]; [Smibert et al , 1999][17]) In addition to the Smaug SCREs, we discovered six other putative SCREs in Drosophila , which we have labeled Dm1 through Dm6, that have coherent supporting TFAPs and annotation ([Figure 4][18]). First, Dm1 and Dm2 were discovered from an mRNA expression microarray time course for Drosophila embryogenesis ([Tadros et al , 2007][14]). Those transcripts that contain high‐affinity instances of Dm1 and Dm2 are expressed at decreasing levels as development proceeds, suggesting that they are involved in destabilizing these transcripts at specific developmental stages. The Dm3 and Dm4 specificities were detected using microarray data that compared expression...