Abstract
It is proposed that a general function of noncoding DNA and RNA sequences in higher organisms (intergenic and intervening sequences) is to provide multiple binding sites over long stretches of polynucleotide for certain types of regulatory proteins. Through the building up or abolishing of high-order structures, these proteins either sequester sites for the control of, e.g., transcription or make the sites available to local molecular signals. If this is to take place, the existence of a ‘c-value paradox’ becomes a requirement. Multiple binding sites for a given protein may recur in the form of a sequence ‘motif’ that is variable within certain limits. Noncoding sequences of the chicken ovalbumin gene furnish an appropriate example of a sequence motif, GAAAATT. Its improbably high frequency and significant periodicity are both absent from the coding sequences of the same gene and from the noncoding sequences of a differently controlled gene in the same organism, the preproinsulin gene. This distribution of a sequence motif is in keeping with the concepts outlined. Low specificity of sequences that bind protein is likely to be compatible with highly specific conformational changes.