Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters

Abstract
Genome sequencing has led to the discovery of tens of thousands of potential new genes. Six years after the sequencing of the well-studied yeast Saccharomyces cerevisiae and the discovery that its genome encodes ∼6,000 predicted proteins, more than 2,000 have not yet been characterized experimentally, and determining their functions seems far from a trivial task. One crucial constraint is the generation of useful hypotheses about protein function. Using a new approach to interpret microarray data, we assign likely cellular functions with confidence values to these new yeast proteins. We perform extensive genome-wide validations of our predictions and offer visualization methods for exploration of the large numbers of functional predictions. We identify potential new members of many existing functional categories including 285 candidate proteins involved in transcription, processing and transport of non-coding RNA molecules. We present experimental validation confirming the involvement of several of these proteins in ribosomal RNA processing. Our methodology can be applied to a variety of genomics data types and organisms.