Computational detection of genomic cis- regulatory modules applied to body patterning in the early Drosophila embryo
Open Access
- 24 October 2002
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 3 (1), 30
- https://doi.org/10.1186/1471-2105-3-30
Abstract
Regulation of gene transcription is crucial for the function and development of all organisms. While gene prediction programs that identify protein coding sequence are used with remarkable success in the annotation of genomes, the development of computational methods to analyze noncoding regions and to delineate transcriptional control elements is still in its infancy. Here we present novel algorithms to detect cis-regulatory modules through genome wide scans for clusters of transcription factor binding sites using three levels of prior information. When binding sites for the factors are known, our statistical segmentation algorithm, Ahab, yields about 150 putative gap gene regulated modules, with no adjustable parameters other than a window size. If one or more related modules are known, but no binding sites, repeated motifs can be found by a customized Gibbs sampler and input to Ahab, to predict genes with similar regulation. Finally using only the genome, we developed a third algorithm, Argos, that counts and scores clusters of overrepresented motifs in a window of sequence. Argos recovers many of the known modules, upstream of the segmentation genes, with no training data. We have demonstrated, in the case of body patterning in the Drosophila embryo, that our algorithms allow the genome-wide identification of regulatory modules. We believe that Ahab overcomes many problems of recent approaches and we estimated the false positive rate to be about 50%. Argos is the first successful attempt to predict regulatory modules using only the genome without training data. Complete results and module predictions across the Drosophila genome are available at http://uqbar.rockefeller.edu/~siggia/.Keywords
This publication has 18 references indexed in Scilit:
- Probabilistic clustering of sequences: Inferring new bacterial regulons by comparative genomicsProceedings of the National Academy of Sciences, 2002
- A Genomic Regulatory Network for DevelopmentScience, 2002
- Extraction of Functional Binding Sites from Unique Regulatory Regions: The Drosophila Early Developmental EnhancersGenome Research, 2002
- Signal Transduction and the Control of Gene ExpressionScience, 2002
- Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genomeProceedings of the National Academy of Sciences, 2002
- Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryoProceedings of the National Academy of Sciences, 2001
- Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysisProceedings of the National Academy of Sciences, 2000
- gff2ps: visualizing genomic annotationsBioinformatics, 2000
- Binding affinities and cooperative interactions with bHLH activators delimit threshold responses to the dorsal gradient morphogenCell, 1993
- Identification of consensus patterns in unaligned DNA sequences known to be functionally relatedBioinformatics, 1990