A distal enhancer and an ultraconserved exon are derived from a novel retroposon

Abstract
Evidence from vertebrate genome sequences has shown that conserved noncoding regions significantly outnumber coding regions, and that these elements are mostly involved in gene regulation. The origins of these elements are largely unknown, but the availability of the sequence of part of the genome of the Indian coelacanth ‘living fossil’ fish can help track their evolutionary history. One group of these conserved genomic elements has now been identified as originating from a novel short interspersed element (SINE) family of retroposons active 410 million years ago in lobed-finned fishes, and still active today in the coelacanth. Some have acquired function in mammals, with one acting as an enhancer for expression of a neurodevelopmental gene, ISL1, and another as an exon in the mRNA processing gene, PCBP2. A class of conserved regions in the tetrapod genome is derived from a SINE retroposon family that was active as much as 400 million years ago, and has retained recent activity in the coelacanth. Hundreds of highly conserved distal cis-regulatory elements have been characterized so far in vertebrate genomes1. Many thousands more are predicted on the basis of comparative genomics2,3. However, in stark contrast to the genes that they regulate, in invertebrates virtually none of these regions can be traced by using sequence similarity, leaving their evolutionary origins obscure. Here we show that a class of conserved, primarily non-coding regions in tetrapods originated from a previously unknown short interspersed repetitive element (SINE) retroposon family that was active in the Sarcopterygii (lobe-finned fishes and terrestrial vertebrates) in the Silurian period at least 410 million years ago (ref. 4), and seems to be recently active in the ‘living fossil’ Indonesian coelacanth, Latimeria menadoensis. Using a mouse enhancer assay we show that one copy, 0.5 million bases from the neuro-developmental gene ISL1, is an enhancer that recapitulates multiple aspects of Isl1 expression patterns. Several other copies represent new, possibly regulatory, alternatively spliced exons in the middle of pre-existing Sarcopterygian genes. One of these, a more than 200-base-pair ultraconserved region5, 100% identical in mammals, and 80% identical to the coelacanth SINE, contains a 31-amino-acid-residue alternatively spliced exon of the messenger RNA processing gene PCBP2 (ref. 6). These add to a growing list of examples7 in which relics of transposable elements have acquired a function that serves their host, a process termed ‘exaptation’8, and provide an origin for at least some of the many highly conserved vertebrate-specific genomic sequences.