Frequency, type, distribution and annotation of simple sequence repeats in Rosaceae ESTs

Abstract
Genomic resources for peach, a model species for Rosaceae, are being developed to accelerate gene discovery in other Rosaceae species by comparative mapping. Simple sequence repeats (SSRs) are an important tool for comparative mapping because of their high polymorphism and transportability. To accelerate the development of SSR markers, we analyzed publicly available Rosaceae expressed sequence tags (ESTs) for SSRs. A total of 17,284 ESTs from almond, peach and rose were assembled into putatively non-redundant EST sets. For comparison, 179,099 ESTs from Arabidopsis were also used in the analysis. About 4% of the assembled ESTs contained SSRs in Rosaceae, which was higher than the 2.4% found in Arabidopsis. About half of the SSRs were found in the putative UTR, and the estimated average distance between SSRs in the UTR was 5.5 kb in rose, 5.1 kb in almond, 7 kb in peach and 13 kb in Arabidopsis. In the putative coding region, the estimated average distance was two to four times longer than in the UTR. Rosaceae ESTs containing SSRs were functionally annotated using the GenBank nr database and further classified using the gene ontology terms associated with the matching sequences in the SwissProt database. The detailed data including the sequences and annotation results are available from http://www.genome.clemson.edu/gdr/rosaceaessr/.