Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium

Abstract
The DNA content of eukaryotic nuclei (C-value) varies ∼200,000-fold, but there is only a ∼20-fold variation in the number of protein-coding genes. Hence, most C-value variation is ascribed to the repetitive fraction, although little is known about the evolutionary dynamics of the specific components that lead to genome size variation. To understand the modes and mechanisms that underlie variation in genome composition, we generated sequence data from whole genome shotgun (WGS) libraries for three representative diploid (n = 13) members of Gossypium that vary in genome size from 880 to 2460 Mb (1C) and from a phylogenetic outgroup, Gossypioides kirkii, with an estimated genome size of 588 Mb. Copy number estimates including all dispersed repetitive sequences indicate that 40%–65% of each genome is composed of transposable elements. Inspection of individual sequence types revealed differential, lineage-specific expansion of various families of transposable elements among the different plant lineages. Copia-like retrotransposable element sequences have differentially accumulated in the Gossypium species with the smallest genome, G. raimondii, while gypsy-like sequences have proliferated in the lineages with larger genomes. Phylogenetic analyses demonstrated a pattern of lineage-specific amplification of particular subfamilies of retrotransposons within each species studied. One particular group of gypsy-like retrotransposon sequences, Gorge3 (Gossypium retrotransposable gypsy-like element), appears to have undergone a massive proliferation in two plant lineages, accounting for a major fraction of genome-size change. Like maize, Gossypium has undergone a threefold increase in genome size due to the accumulation of LTR retrotransposons over the 5–10 Myr since its origin.