De novo identification of LTR retrotransposons in eukaryotic genomes
Open Access
- 3 April 2007
- journal article
- research article
- Published by Springer Nature in BMC Genomics
- Vol. 8 (1), 90
- https://doi.org/10.1186/1471-2164-8-90
Abstract
Background: LTR retrotransposons are a class of mobile genetic elements containing two similar long terminal repeats (LTRs). Currently, LTR retrotransposons are annotated in eukaryotic genomes mainly through the conventional homology searching approach. Hence, it is limited to annotating known elements.Results: In this paper, we report ade novocomputational method that can identify new LTR retrotransposons without relying on a library of known elements. Specifically, our method identifies intact LTR retrotransposons by using an approximate string matching technique and protein domain analysis. In addition, it identifies partially deleted or solo LTRs using profile Hidden Markov Models (pHMMs). As a result, this method cande novoidentify all types of LTR retrotransposons. We tested this method on the two pairs of eukaryotic genomes,C. elegansvs.C. briggsaeandD. melanogastervs.D. pseudoobscura. LTR retrotransposons inC. elegansandD. melanogasterhave been intensively studied using conventional annotation methods. Comparing with previous work, we identified new intact LTR retroelements and new putative families, which may imply that there may still be new retroelements that are left to be discovered even in well-studied organisms. To assess the sensitivity and accuracy of our method, we compared our results with a previously published method, LTR_STRUC, which predominantly identifies full-length LTR retrotransposons. In summary, both methods identified comparable number of intact LTR retroelements. But our method can identify nearly all known elements inC. elegans, while LTR_STRUCT missed about 1/3 of them. Our method also identified more known LTR retroelements than LTR_STRUCT in theD. melanogastergenome. We also identified some LTR retroelements in the other two genomes,C. briggsaeandD. pseudoobscura, which have not been completely finished. In contrast, the conventional method failed to identify those elements. Finally, the phylogenetic and chromosomal distributions of the identified elements are discussed.Conclusion: We report a novel method for de novo identification of LTR retrotransposons in eukaryotic genomes with favorable performance over the existing methods.Keywords
This publication has 34 references indexed in Scilit:
- Identification of transposable elements using multiple alignments of related genomesGenome Research, 2005
- GAME: A simple and efficient whole genome alignment method using maximal exact match filteringComputational Biology and Chemistry, 2005
- De novo identification of repeat families in large genomesBioinformatics, 2005
- Combined Evidence Annotation of Transposable Elements in Genome SequencesPLoS Computational Biology, 2005
- Analyses of LTR-Retrotransposon Structures Reveal Recent and Rapid Genomic DNA Loss in RiceGenome Research, 2004
- The Pfam protein families databaseNucleic Acids Research, 2004
- The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative GenomicsPLoS Biology, 2003
- Multiple sequence alignment with the Clustal series of programsNucleic Acids Research, 2003
- Transcendent Elements: Whole-Genome Transposon Screens and Open Evolutionary QuestionsGenome Research, 2002
- NoticesCladistics, 1989