Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo

Abstract
Metazoan genomes contain vast tracts of cis-regulatory DNA that have been identified typically through tedious functional assays. As a result, it has not been possible to uncover a cis-regulatory code that links primary DNA sequences to gene expression patterns. In an initial effort to determine whether coordinately regulated genes share a common "grammar," we have examined the distribution of Dorsal recognition sequences in the Drosophila genome. Dorsal is one of the best-characterized sequence-specific transcription factors in Drosophila. The homeobox gene zerknullt (zen) is repressed directly by Dorsal, and this repression is mediated by a 600-bp silencer, the ventral repression element (VRE), which contains four optimal Dorsal binding sites. The arrangement and sequence of the Dorsal recognition sequences in the VRE were used to develop a computational algorithm to search the Drosophila genome for clusters of optimal Dorsal binding sites. There are 15 regions in the genome that contain three or more optimal sites within a span of 400 bp or less. Three of these regions are associated with known Dorsal target genes: sog, zen, and Brinker. The Dorsal binding cluster in sog is shown to mediate lateral stripes of gene expression in response to low levels of the Dorsal gradient. Two of the remaining 12 clusters are shown to be associated with genes that exhibit asymmetric patterns of expression across the dorsoventral axis. These results suggest that bioinformatics can be used to identify novel target genes and associated regulatory DNAs in a gene network.