Abstract
Two DNA sequence elements are known to recur frequently upstream of eukaryotic polymerase II-transcribed genes. The TATAAA, at position -40, specifies the transcription initiation site. The GGCCAATCT is less frequent around -80. Sequence analysis of upstream regions reveals that the underlined yeast UAS2 consensus sequence, TGATTGGT, is also very frequent at -80 in higher polymerase II-transcribed animal sequences. The underlined CCAAT box and yeast UAS sequences are complementary. Structural analysis suggests some symmetry in their DNA structures. Upstream of the TATAAT-rich region there is an abundance of GC sequences. Analysis of nucleotide tracts indicates that these are preferentially flanked by their complementary nucleotides with a pyrimidine-purine junction, i.e., TTAN, CCGn, CnGG, TnAA. Here, I discuss DNA structural consideration in upstream regions along with protein readout of the major and minor groove information content. These sequence-structure aspects are put in the general context of protein (factors)-DNA (elements) recognition and regulation.