Detection of functional DNA motifs via statistical over-representation
Top Cited Papers
- 23 February 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 32 (4), 1372-1381
- https://doi.org/10.1093/nar/gkh299
Abstract
The interaction of proteins with DNA recognition motifs regulates a number of fundamental biological processes, including transcription. To understand these processes, we need to know which motifs are present in a sequence and which factors bind to them. We describe a method to screen a set of DNA sequences against a precompiled library of motifs, and assess which, if any, of the motifs are statistically over- or under-represented in the sequences. Over-represented motifs are good candidates for playing a functional role in the sequences, while under-representation hints that if the motif were present, it would have a harmful dysregulatory effect. We apply our method (implemented as a computer program called Clover) to dopamine-responsive promoters, sequences flanking binding sites for the transcription factor LSF, sequences that direct transcription in muscle and liver, and Drosophila segmentation enhancers. In each case Clover successfully detects motifs known to function in the sequences, and intriguing and testable hypotheses are made concerning additional motifs. Clover compares favorably with an ab initio motif discovery algorithm based on sequence alignment, when the motif library includes only a homolog of the factor that actually regulates the sequences. It also demonstrates superior performance over two contingency table based over-representation methods. In conclusion, Clover has the potential to greatly accelerate characterization of signals that regulate transcription.Keywords
This publication has 38 references indexed in Scilit:
- Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequencesNucleic Acids Research, 2002
- Evolution of Transcription Factor Binding Sites in Mammalian Gene Regulatory Regions: Conservation and TurnoverMolecular Biology and Evolution, 2002
- Dorsal Gradient Networks in the Drosophila EmbryoDevelopmental Biology, 2002
- Extraction of Functional Binding Sites from Unique Regulatory Regions: The Drosophila Early Developmental EnhancersGenome Research, 2002
- Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genomeProceedings of the National Academy of Sciences, 2002
- Detection of cis -element clusters in higher eukaryotic DNABioinformatics, 2001
- A Predictive Model for Regulatory Sequences Directing Liver-Specific TranscriptionGenome Research, 2001
- Transcriptional regulation of cytoskeletal functions and segmentation by a novel maternal pair-rule gene, lilliputianDevelopment, 2001
- Genomic strategies to identify mammalian regulatory sequencesNature Reviews Genetics, 2001
- Sequence and functional properties of Ets genes in the model organism DrosophilaOncogene, 2000