Bioinformatic analysis of exon repetition, exon scrambling and trans-splicing in humans

Abstract
Motivation: Using bioinformatic approaches we aimed to characterize poorly understood abnormalities in splicing known as exon scrambling, exon repetition and trans-splicing. Results: We developed a software package that allows large-scale comparison of all human expressed sequence tags (EST) sequences to the entire set of human gene sequences. Among 5 992 495 EST sequences, 401 cases of exon repetition and 416 cases of exon scrambling were found. The vast majority of identified ESTs contain fragments rather than full-length repeated or scrambled exons. Their structures suggest that the scrambled or repeated exon fragments may have arisen in the process of cDNA cloning and not from splicing abnormalities. Nevertheless, we found 11 cases of full-length exon repetition showing that this phenomenon is real yet very rare. In searching for examples of trans-splicing, we looked only at reproducible events where at least two independent ESTs represent the same putative trans-splicing event. We found 15 ESTs representing five types of putative trans-splicing. However, all 15 cases were derived from human malignant tissues and could have resulted from genomic rearrangements. Our results provide support for a very rare but physiological occurrence of exon repetition, but suggest that apparent exon scrambling and trans-splicing result, respectively, from in vitro artifact and gene-level abnormalities. Availability: Exon–Intron Database (EID) is available at . Programs are available at . The Laboratory website is available at Contact:afedorov@meduohio.edu Supplementary information: Supplementary file is available at