Simple repetitive DNA sequences from primates: Compilation and analysis

Abstract
Simple repeats composed of tandemly repeated units 1–6 nucleotides (nt) long have been extracted from a selected set of primate genomic DNA sequences. Of the 501 theoretically possible, different types of repeats only 67 were present in the analyzed database in at least two different size ranges over 12 nt. They include all simple repeats known to be polymorphic in the primate genome. A list of moderately expanding and nonexpanding oligonucleotide patterns has also been included. Furthermore, we have compiled statistical data with emphasis on the overall variability of the most abundant 67 types of repeats. We have demonstrated that the expandability of at least some simple repeats may be affected by the overall base composition and by flanking sequences. In particular, the occurrence of tandemly repeated CAG and GCC triplets in exons positively correlates with their G+C content. We also noted that in the vicinity of Alu sequences tetrameric repeats are more abundant than in the total genomic DNA. This paper can be used as a comprehensive guide in identification of the most abundant and potentially polymorphic simple repeats. It is also of broader significance as a step toward understanding the contribution of flanking sequences and the overall sequence composition to variability of simple repeats.