Detection of functional DNA motifs via statistical over-representation

Top Cited Papers

23 February 2004

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 32 (4), 1372-1381
https://doi.org/10.1093/nar/gkh299

Abstract

The interaction of proteins with DNA recognition motifs regulates a number of fundamental biological processes, including transcription. To understand these processes, we need to know which motifs are present in a sequence and which factors bind to them. We describe a method to screen a set of DNA sequences against a precompiled library of motifs, and assess which, if any, of the motifs are statistically over- or under-represented in the sequences. Over-represented motifs are good candidates for playing a functional role in the sequences, while under-representation hints that if the motif were present, it would have a harmful dysregulatory effect. We apply our method (implemented as a computer program called Clover) to dopamine-responsive promoters, sequences flanking binding sites for the transcription factor LSF, sequences that direct transcription in muscle and liver, and Drosophila segmentation enhancers. In each case Clover successfully detects motifs known to function in the sequences, and intriguing and testable hypotheses are made concerning additional motifs. Clover compares favorably with an ab initio motif discovery algorithm based on sequence alignment, when the motif library includes only a homolog of the factor that actually regulates the sequences. It also demonstrates superior performance over two contingency table based over-representation methods. In conclusion, Clover has the potential to greatly accelerate characterization of signals that regulate transcription.

Keywords

This publication has 38 references indexed in Scilit:

Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences
Nucleic Acids Research, 2002
Evolution of Transcription Factor Binding Sites in Mammalian Gene Regulatory Regions: Conservation and Turnover
Molecular Biology and Evolution, 2002
Dorsal Gradient Networks in the Drosophila Embryo
Developmental Biology, 2002
Extraction of Functional Binding Sites from Unique Regulatory Regions: The Drosophila Early Developmental Enhancers
Genome Research, 2002
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome
Proceedings of the National Academy of Sciences, 2002
Detection of cis -element clusters in higher eukaryotic DNA
Bioinformatics, 2001
A Predictive Model for Regulatory Sequences Directing Liver-Specific Transcription
Genome Research, 2001
Transcriptional regulation of cytoskeletal functions and segmentation by a novel maternal pair-rule gene, lilliputian
Development, 2001
Genomic strategies to identify mammalian regulatory sequences
Nature Reviews Genetics, 2001
Sequence and functional properties of Ets genes in the model organism Drosophila
Oncogene, 2000

Cited by 412 articles