Associating transcription factor-binding site motifs with target GO terms and target genes

Abstract
The roles and target genes of many transcription factors (TFs) are still unknown. To predict the roles of TFs, we present a computational method for associating Gene Ontology (GO) terms with TF-binding motifs. The method works by ranking all genes as potential targets of the TF, and reporting GO terms that are significantly associated with highly ranked genes. We also present an approach, whereby these predicted GO terms can be used to improve predictions of TF target genes. This uses a novel gene-scoring function that reflects the insight that genes annotated with GO terms predicted to be associated with the TF are more likely to be its targets. We construct validation sets of GO terms highly associated with known targets of various yeast and human TF. On the yeast reference sets, our prediction method identifies at least one correct GO term for 73% of the TF, 49% of the correct GO terms are predicted and almost one-third of the predicted GO terms are correct. Results on human reference sets are similarly encouraging. Validation of our target gene prediction method shows that its accuracy exceeds that of simple motif scanning.