Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions

Abstract
Simple sequence repeats (SSRs) are found in most organisms, and occupy about 3% of the human genome. Although it is becoming clear that such repeats are important in genomic organization and function and may be associated with disease conditions, their systematic analysis has not been reported. This is the first report examining the distribution and density of simple sequence repeats (1-6 base-pairs (bp)) in the entire human genome. The densities of SSRs across the human chromosomes were found to be relatively uniform. However, the overall density of SSR was found to be high in chromosome 19. Triplets and hexamers were more predominant in exonic regions compared to intronic and intergenic regions, except for chromosome Y. Comparison of densities of various SSRs revealed that whereas trimers and pentamers showed a similar pattern (500-1,000 bp/Mb) across the chromosomes, di- tetra- and hexa-nucleotide repeats showed patterns of higher (2,000-3,000 bp/Mb) density. Repeats of the same nucleotide were found to be higher than other repeat types. Repeats of A, AT, AC, AAT, AAC, AAG, AGC, AAAC, AAAT, AAAG, AAGG, AGAT predominate, whereas repeats of C, CG, ACT, ACG, AACC, AACG, AACT, AAGC, AAGT, ACCC, ACCG, ACCT, CCCG and CCGG are rare. The overall SSR density was comparable in all chromosomes. The density of different repeats, however, showed significant variation. Tri- and hexa-nucleotide repeats are more abundant in exons, whereas other repeats are more abundant in non-coding regions.