VNTR allele frequency distributions under the stepwise mutation model: a computer simulation approach.

Abstract
Variable numbers of tandem repeats (VNTRs) are a class of highly informative and widely dispersed genetic markers. Despite their wide application in biological science, little is known about their mutational mechanisms or population dynamics. The objective of this work was to investigate four summary measures of VNTR allele frequency distributions: number of alleles, number of modes, range in allele size and heterozygosity, using computer simulations of the one-step stepwise mutation model (SMM). We estimated these measures and their probability distributions for a wide range of mutation rates and compared the simulation results with predictions from analytical formulations of the one-step SMM. The average heterozygosity from the simulations agreed with the analytical expectation under the SMM. The average number of alleles, however, was larger in the simulations than the analytical expectation of the SMM. We then compared our simulation expectations with actual data reported in the literature. We used the sample size and observed heterozygosity to determine the expected value, 5th and 95th percentiles for the other three summary measures, allelic size range, number of modes and number of alleles. The loci analyzed were classified into three groups based on the size of the repeat unit: microsatellites (1-2 base pair (bp) repeat unit), short tandem repeats [(STR) 3-5 bp repeat unit], and minisatellites (15-70 bp repeat unit). In general, STR loci were most similar to the simulation results under the SMM for the three summary measures (number of alleles, number of modes and range in allele size), followed by the microsatellite loci and then by the minisatellite loci, which showed deviations in the direction of the infinite allele model (IAM). Based on these differences, we hypothesize that these three classes of loci are subject to different mutational forces.