EFFICIENT ALGORITHMS AND SOFTWARE FOR DETECTION OF FULL-LENGTH LTR RETROTRANSPOSONS

Abstract
LTR retrotransposons constitute one of the most abundant classes of repetitive elements in eukaryotic genomes. In this paper, we present a new algorithm for detection of full-length LTR retrotransposons in genomic sequences. The algorithm identifies regions in a genomic sequence that show structural characteristics of LTR retrotransposons. Three key components distinguish our algorithm from that of current software — (i) a novel method that preprocesses the entire genomic sequence in linear time and produces high quality pairs of LTR candidates in run-time that is constant per pair, (ii) a thorough alignment-based evaluation of candidate pairs to ensure high quality prediction, and (iii) a robust parameter set encompassing both structural constraints and quality controls providing users with a high degree of flexibility. We implemented our algorithm into a software program called LTR_par, which can be run on both serial and parallel computers. Validation of our software against the yeast genome indicates superior results in both quality and performance when compared to existing software. Additional validations are presented on rice BACs and chimpanzee genome.