Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies

Top Cited Papers

Open Access

25 July 2008

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Genetics

Vol. 4 (7), e1000130
https://doi.org/10.1371/journal.pgen.1000130

Abstract

Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation. Tests of association with disease status are normally conducted one SNP at a time, ignoring the effects of all other genotyped SNPs. We developed a computationally efficient method to simultaneously analyse all SNPs, either in a genome-wide association (GWA) study, or a fine-mapping study based on re-sequencing and/or imputation. The method selects a subset of SNPs that best predicts disease status, while controlling the type-I error of the selected SNPs. This brings many advantages over standard single-SNP approaches, because the signal from a particular SNP can be more clearly assessed when other SNPs associated with disease status are already included in the model. Thus, in comparison with single-SNP analyses, power is increased and the false positive rate is reduced because of reduced residual variation. Localisation is also greatly improved. We demonstrate these advantages over the widely used single-SNP Armitage Trend Test using GWA simulation studies, a real GWA dataset, and a sequence-based fine-mapping simulation study.

Keywords

This publication has 21 references indexed in Scilit:

Shifting Paradigm of Association Studies: Value of Rare Single-Nucleotide Polymorphisms
American Journal of Human Genetics, 2008
Sequence-Level Population Simulations Over Large Genomic Regions
Genetics, 2007
Prediction of individual genetic risk to disease from genome-wide association studies
Genome Research, 2007
Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits
PLoS Genetics, 2007
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls
Nature, 2007
A genome-wide association study identifies novel risk loci for type 2 diabetes
Nature, 2007
Bayesian auxiliary variable models for binary and multinomial regression
Bayesian Analysis, 2006
Population Structure and Eigenanalysis
PLoS Genetics, 2006
Variable Selection via Gibbs Sampling
Journal of the American Statistical Association, 1993
Bayesian Variable Selection in Linear Regression
Journal of the American Statistical Association, 1988

Cited by 285 articles