Sequence analysis of a compound coding-region microsatellite inCandida albicansresolves homoplasies and provides a high-resolution tool for genotyping

Abstract
Sequence diversity at a coding-region microsatellite locus of two diploid Candida species was surveyed. Twenty-one alleles from fourteen strains of Candida albicans and three alleles from two strains of the closely related Candida dubliniensis were sequenced. Results show independent length variation in two contiguous hexanucleotide repeats, one non-contiguous hexanucleotide repeat, and two non-contiguous trinucleotide repeats within a 120 bp coding region. A neighboring, non-repetitive 120 bp region showed no variation. The information density of sequence polymorphisms in this region provides a powerful tool for genotyping microorganisms in epidemiological studies, yielding detailed resolution of closely related strains, and clearly distinguishing the two species studied here. The individual length-variable repeat regions are very short (2–8 repeats), demonstrating that even very short microsatellites can show high levels of length variability when surrounded by similarly repetitive DNA. Extensive homoplasy was discovered among the C. albicans alleles, with the majority of overall length categories consisting of alleles with more than one sequence. Our results show that microsatellite length alone should not be used to assume either sequence identity or identity by descent. Microsatellite length mutations appear to have generated the high degree of both inter- and intraspecific polymorphism seen at the ERK1 locus, and form an island of variability in an otherwise well-conserved gene.