Integrative analysis of genome‐wide experiments in the context of a large high‐throughput data compendium
Open Access
- 1 January 2005
- journal article
- research article
- Published by European Molecular Biology Organization in Molecular Systems Biology
- Vol. 1 (1), 2005.0002
- https://doi.org/10.1038/msb4100005
Abstract
Biological systems are orchestrated by heterogeneous regulatory programs that control complex processes and adapt to a dynamic environment. Recent advances in high‐throughput experimental methods provide genome‐wide perspectives on such regulatory programs. A considerable amount of data on the behavior of model systems in a variety of conditions is rapidly accumulating. Still, the dominant paradigm is to analyze new genome‐wide experiments separately from any other extant data, for example, by clustering the new data alone. Here we introduce a new methodology for analyzing the results of a new functional genomic study vis‐à‐vis a large compendium of previously published results from heterogeneous experimental techniques. We demonstrate our methodology on Saccharomyces cerevisiae , using a compendium of some 2000 experiments from 60 different publications. Most importantly, we show how the integrated analysis reveals unexpected connections among biological processes, and differentiates between novel and known effects in the analyzed experiments. Such characterization is impossible when new data sets are studied in isolation. Our results exemplify the power of the integrative approach in the analysis of genomic scale data sets and call for a paradigm shift in their study. ### Synopsis Some of the greatest “success stories” in modern biology can be attributed to coordinated community efforts that tackled an overwhelmingly large problem using a web of semi‐independent efforts. In the most prominent and recent of these efforts, the emergence of genomics was facilitated by the ability to share and compare sequence data, by the availability of these data to extensive search, and by the aggregation of data into one body of knowledge. A major challenge of today's biology is the functional characterization of biological systems. This problem (in any of several alternative forms) is probably one of the largest ever attempted by biologists, and is thus a natural candidate for being tackled by such a community‐based scheme. With this long‐term goal in mind, the present study proposes a methodology that can help to exploit large‐scale compendia of functional genomics data as part of the routine analysis of high throughput experiments. Using a large collection of different types of data obtained for the baker's yeast S. cerevisiae , we demonstrate how fruitful such combined approach may be in characterizing responses to specific conditions from a system level perspective. We focus on a relatively simple building block of biological systems ‐ the functional module . Following the pioneering studies of gene expression profiles ([Eisen et al. , 1998][1]), researchers have extensively used clusters of co‐expressed genes to gain insights into the organization of regulatory processes. Clustering, in its simple form, partitions the genome into disjoint gene sets (possibly obeying hierarchical organization), such that each set manifests a different characteristic expression pattern across all the experimental conditions. A natural generalization of a co‐expressed gene cluster is a transcriptional module ([Ihmels et al. , 2002][2]) ‐ a set of genes that are co‐expressed in some (but not necessarily all) experimental conditions. Transcriptional modules are a more flexible and realistic building block for biological systems. A certain gene may belong to more than a single transcriptional module, as it can be expressed (or may exhibit different genetic and physical interactions) under different conditions. Transcriptional modules can be detected using bicluster analysis of gene expression datasets. In bicluster analysis, the output is not a set of disjoint clusters, but a collection of (possibly overlapping) transcriptional modules that can represent phenomena like pleiotropy or context‐dependent regulation. Finally, a functional module (FM) generalizes a transcriptional module by taking into account other heterogeneous sources of biological information in addition to gene expression (e.g., protein interactions, synthetic lethality, etc.). A functional module is thus a set of genes that are correlated with each other across a set of biological properties. In previous work ([Tanay et al. , 2004][3]) we have introduced the SAMBA algorithm for detecting FMs in very large scale and highly heterogeneous datasets. Biological properties can represent any source of information on genes and their products, including gene expression, phenotype and protein interactions. What can be gained from dissection of biological systems using FMs? FMs simplify the understanding of biological systems by representing cellular processes in terms of the activity of a modest number of modules instead of thousands of genes. As we show here, a comprehensive set of FMs for a model system, built by integrating data from many different studies and sources, may form a valuable foundation when analyzing the results of a new experiment. For example, O'Rourke and Herskowitz ([O'Rourke and Herskowitz, 2004][4]) studied the response of several key S. cerevisiae mutant strains to variable levels of hyper‐osmotic stress. By analyzing the resulting gene expression dataset using standard clustering and extensive expert analysis, the process of hyper osmotic adaptation was dissected into several clusters containing hundreds of genes each. These clusters represent groups of genes that exhibit typical response patterns in the osmotic shock treatments. On the other hand, by adding the Orourke‐Herskowitz gene expression profiles to the vast compendium of available yeast functional properties accumulated so far (including almost 2000 different conditions from 60 different studies) and analyzing the combined dataset using the methods described here, we can characterize the osmo‐adaptation process in terms of the activity of a small...Keywords
This publication has 41 references indexed in Scilit:
- Peroxisome Function Regulates Growth on Glucose in the Basidiomycete Fungus Cryptococcus neoformansEukaryotic Cell, 2007
- Conservation and evolvability in regulatory networks: The evolution of ribosomal regulation in yeastProceedings of the National Academy of Sciences, 2005
- Similarities and Differences in Genome-Wide Expression Data of Six OrganismsPLoS Biology, 2003
- Chromosomal gradient of histone acetylation established by Sas2p and Sir2p functions as a shield against gene silencingNature Genetics, 2002
- Gal80 Confers Specificity on HAT Complex Interactions with ActivatorsPublished by Elsevier ,2002
- Dual cell wall/mitochondria localization of the ‘SUN’ family proteinsFEMS Microbiology Letters, 2002
- Yeast Cbk1 and Mob2 Activate Daughter-Specific Genetic Programs to Induce Asymmetric Cell FatesCell, 2001
- Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBFNature, 2001
- Genome-Wide Location and Function of DNA Binding ProteinsScience, 2000
- Computational identification of Cis -regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae 1 1Edited by F. E. CohenJournal of Molecular Biology, 2000