A strategy for identifying transcription factor binding sites reveals two classes of genomic c-Myc target sites

Abstract
Defining the hardwiring of transcription factors to their cognate genomic binding sites is essential for our understanding of biological processes. We used scanning chromatin immunoprecipitation to identify in vivo binding regions (E boxes) for c-Myc in three target genes as a model system. Along with other c-Myc target genes that have been validated by chromatin immunoprecipitation, we used the publicly available genomic sequences to determine whether experimentally derived in vivo binding sites might be predictable from nonexonic sequence conservation across species. Our studies revealed two classes of target genomic binding sites. Although the majority of target genes studied [class I: B23 (NPM1), CAD, CDK4, cyclin D2, ID2, LDH-A, MNT, PTMa, ODC, NM23B, nucleolin, prohibitin, SHMT1, and SHMT2] demonstrate significant sequence conservation of the E boxes and flanking regions, several genes (cyclin B1, JPO1, and PRDX3) belong to a second class (class II) that does not display sequence conservation at and around the site of c-Myc binding. On the basis of our model, we propose a strategy for predicting transcription factor binding sites using phylogenetic sequence comparisons, which will select potential class I target genes among the many emerging candidates from DNA-microarray studies for experimental validation by chromatin immunoprecipitation.