Analysis of Conserved Noncoding DNA inDrosophilaReveals Similar Constraints in Intergenic and Intronic Sequences

Abstract
Comparative genomic approaches to gene and cis-regulatory prediction are based on the principle that differential DNA sequence conservation reflects variation in functional constraint. Using this principle, we analyze noncoding sequence conservation inDrosophila for 40 loci with known or suspectedcis-regulatory function encompassing >100 kb of DNA. We estimate the fraction of noncoding DNA conserved in both intergenic and intronic regions and describe the length distribution of ungapped conserved noncoding blocks. On average, 22%–26% of noncoding sequences surveyed are conserved in Drosophila, with median block length ∼19 bp. We show that point substitution in conserved noncoding blocks exhibits transition bias as well as lineage effects in base composition, and occurs more than an order of magnitude more frequently than insertion/deletion (indel) substitution. Overall, patterns of noncoding DNA structure and evolution differ remarkably little between intergenic and intronic conserved blocks, suggesting that the effects of transcription per se contribute minimally to the constraints operating on these sequences. The results of this study have implications for the development of alignment and prediction algorithms specific to noncoding DNA, as well as for models ofcis-regulatory DNA sequence evolution.