Evolutionary selection for perfect hairpin structures in viral DNAs

Abstract
Several recent discoveries1 have pointed to nucleic acid secondary structure as an additional dimension in gene expression2. Further evidence for the formation of hairpins in RNA is the fact that cruciforms exist in negatively supercoiled DNAs3–5. As potential binding sites for proteins, these structures have been proposed to play a part in the regulation of various crucial reactions, such as replication6,7, transcription8, or RNA processing9. As any random nucleotide sequence can self-anneal with an approximately 50% chance of forming some Watson-Crick-type base pairs10, it is difficult to assess which, if any, of all possible hairpin-like secondary structures may be biologically relevant. We have computed the expected distribution of perfectly base-paired structures as a function of loop size and stem length and compared it with the distribution observed in the complete genome of eight DNA viruses from animals, plants and bacteria. We report here that hairpins having six or more consecutive base pairs in the stem are not distributed randomly along the genome, occur much more often than chance would predict, and are particularly over-represented in regions that appear to have regulatory significance. The average loop size was found to decrease with an increase in stem length. These results support our previous hypothesis that these structures are biologically relevant11.