Abstract
Analysis of cloned human genomic loci homologous to the small nuclear RNA U1 established that such sequences are abundant and dispersed in the human genome and that only a fraction represent bona fide genes. The majority of genomic loci bear defective gene copies, or pseudogenes, which contain scattered base mismatches and in some cases lack the sequence corresponding to the 3' end of U1 RNA. Although all of the U1 genes examined to date are flanked by essentially identical sequences and therefore appear to comprise a single multigene family, we present evidence for the existence of at least three structurally distinct classes of U1 pseudogenes. Class I pseudogenes had considerable flanking sequence homology with the U1 gene family and were probably derived from it by a DNA-mediated event such as gene duplication. In contrast, the U1 sequence in class II and III U1 pseudogenes was flanked by single-copy genomic sequences completely unrelated to those flanking the U1 gene family; in addition, short direct repeats flanked the class III but not the class II pseudogenes. We therefore propose that both class II and III U1 pseudogenes were generated by an RNA-mediated mechanism involving the insertion of U1 sequence information into a new chromosomal locus. We also noted that two other types of repetitive DNA sequences in eucaryotes, the Alu family in vertebrates and the ribosomal DNA insertions in Drosophila, bore a striking structural resemblance to the classes of U1 pseudogenes described here and may have been created by an RNA-mediated insertion event.