A sequence-based variation map of 8.27 million SNPs in inbred mouse strains

Abstract
A major new resource is now available to geneticists working on the many mouse models that are used to study toxicity and human disease. The genomes of four wild-derived and eleven inbred laboratory mouse strains have been resequenced to create a comprehensive resource of DNA variation. About 8.3 million single base-pair differences known as single nucleotide polymorphisms (SNPs) were identified. The data are publicly available as a mouse 'HapMap' at http://mouse.perlegen.com/ . The density and quality of this set of SNP markers is unprecedented for a mammalian genome, and it will provide a powerful tool for identifying the genetic determinants of phenotypic variation in the mouse. The genomic resources for laboratory mice are greatly expanded by resequencing the genomes of 15 different strains to find single nucleotide polymorphisms. This creates a 'HapMap' for mice with 8.27 million markers. A dense map of genetic variation in the laboratory mouse genome will provide insights into the evolutionary history of the species1 and lead to an improved understanding of the relationship between inter-strain genotypic and phenotypic differences. Here we resequence the genomes of four wild-derived and eleven classical strains. We identify 8.27 million high-quality single nucleotide polymorphisms (SNPs) densely distributed across the genome, and determine the locations of the high (divergent subspecies ancestry) and low (common subspecies ancestry) SNP-rate intervals2,3,4,5,6 for every pairwise combination of classical strains. Using these data, we generate a genome-wide haplotype map containing 40,898 segments, each with an average of three distinct ancestral haplotypes. For the haplotypes in the classical strains that are unequivocally assigned ancestry, the genetic contributions of the Mus musculus subspecies—M. m. domesticus, M. m. musculus, M. m. castaneus and the hybrid M. m. molossinus—are 68%, 6%, 3% and 10%, respectively; the remaining 13% of haplotypes are of unknown ancestral origin. The considerable regional redundancy of the SNP data will facilitate imputation of the majority of these genotypes in less-densely typed classical inbred strains to provide a complete view of variation in additional strains.