Data quality control in genetic case-control association studies

Top Cited Papers

26 August 2010

journal article
research article
Published by Springer Nature in Nature Protocols

Vol. 5 (9), 1564-1573
https://doi.org/10.1038/nprot.2010.116

Abstract

This protocol details the steps for data quality assessment and control that are typically carried out during case-control association studies. The steps described involve the identification and removal of DNA samples and markers that introduce bias. These critical steps are paramount to the success of a case-control study and are necessary before statistically testing for association. We describe how to use PLINK, a tool for handling SNP data, to perform assessments of failure rate per individual and per SNP and to assess the degree of relatedness between individuals. We also detail other quality-control procedures, including the use of SMARTPCA software for the identification of ancestral outliers. These platforms were selected because they are user-friendly, widely used and computationally efficient. Steps needed to detect and establish a disease association using case-control data are not discussed here. Issues concerning study design and marker selection in case-control studies have been discussed in our earlier protocols. This protocol, which is routinely used in our labs, should take approximately 8 h to complete.

Keywords

This publication has 27 references indexed in Scilit:

An evaluation of statistical approaches to rare variant analysis in genetic association studies
Genetic Epidemiology, 2009
Marker selection for genetic case–control association studies
Nature Protocols, 2009
Ulcerative colitis–risk loci on chromosomes 1p36 and 12q15 found by genome-wide association study
Nature Genetics, 2009
Long-Range LD Can Confound Genome Scans in Admixed Populations
American Journal of Human Genetics, 2008
Genetic determinants of ulcerative colitis include the ECM1 locus and five loci implicated in Crohn's disease
Nature Genetics, 2008
A new multipoint method for genome-wide association studies by imputation of genotypes
Nature Genetics, 2007
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls
Nature, 2007
Principal components analysis corrects for stratification in genome-wide association studies
Nature Genetics, 2006
A haplotype map of the human genome
Nature, 2005
The International HapMap Project
Nature, 2003

Cited by 1083 articles