Efficient Control of Population Structure in Model Organism Association Mapping

Top Cited Papers

1 March 2008

journal article
research article
Published by Oxford University Press (OUP) in Genetics

Vol. 178 (3), 1709-1723
https://doi.org/10.1534/genetics.107.080101

Abstract

Genomewide association mapping in model organisms such as inbred mouse strains is a promising approach for the identification of risk factors related to human diseases. However, genetic association studies in inbred model organisms are confronted by the problem of complex population structure among strains. This induces inflated false positive rates, which cannot be corrected using standard approaches applied in human association studies such as genomic control or structured association. Recent studies demonstrated that mixed models successfully correct for the genetic relatedness in association mapping in maize and Arabidopsis panel data sets. However, the currently available mixed-model methods suffer from computational inefficiency. In this article, we propose a new method, efficient mixed-model association (EMMA), which corrects for population structure and genetic relatedness in model organism association mapping. Our method takes advantage of the specific nature of the optimization problem in applying mixed models for association mapping, which allows us to substantially increase the computational speed and reliability of the results. We applied EMMA to in silico whole-genome association mapping of inbred mouse strains involving hundreds of thousands of SNPs, in addition to Arabidopsis and maize data sets. We also performed extensive simulation studies to estimate the statistical power of EMMA under various SNP effects, varying degrees of population structure, and differing numbers of multiple measurements per strain. Despite the limited power of inbred mouse association mapping due to the limited number of available inbred strains, we are able to identify significantly associated SNPs, which fall into known QTL or genes identified through previous studies while avoiding an inflation of false positives. An R package implementation and webserver of our EMMA method are publicly available.

This publication has 63 references indexed in Scilit:

A sequence-based variation map of 8.27 million SNPs in inbred mouse strains
Nature, 2007
An Arabidopsis Example of Association Mapping in Structured Samples
PLoS Genetics, 2007
Principal components analysis corrects for stratification in genome-wide association studies
Nature Genetics, 2006
A unified mixed-model method for association mapping that accounts for multiple levels of relatedness
Nature Genetics, 2005
The Pattern of Polymorphism in Arabidopsis thaliana
PLoS Biology, 2005
Use of a Dense Single Nucleotide Polymorphism Map for In Silico Mapping in the Mouse
PLoS Biology, 2004
Statistical significance for genomewide studies
Proceedings of the National Academy of Sciences, 2003
Phylogenies and the Comparative Method: A General Approach to Incorporating Phylogenetic Information into the Analysis of Interspecific Data
The American Naturalist, 1997
Phylogenies and the Comparative Method
The American Naturalist, 1985
Evolutionary trees from DNA sequences: A maximum likelihood approach
Journal of Molecular Evolution, 1981

Cited by 1646 articles