Data quality control in genetic case-control association studies
Top Cited Papers
- 26 August 2010
- journal article
- research article
- Published by Springer Nature in Nature Protocols
- Vol. 5 (9), 1564-1573
- https://doi.org/10.1038/nprot.2010.116
Abstract
This protocol details the steps for data quality assessment and control that are typically carried out during case-control association studies. The steps described involve the identification and removal of DNA samples and markers that introduce bias. These critical steps are paramount to the success of a case-control study and are necessary before statistically testing for association. We describe how to use PLINK, a tool for handling SNP data, to perform assessments of failure rate per individual and per SNP and to assess the degree of relatedness between individuals. We also detail other quality-control procedures, including the use of SMARTPCA software for the identification of ancestral outliers. These platforms were selected because they are user-friendly, widely used and computationally efficient. Steps needed to detect and establish a disease association using case-control data are not discussed here. Issues concerning study design and marker selection in case-control studies have been discussed in our earlier protocols. This protocol, which is routinely used in our labs, should take approximately 8 h to complete.Keywords
This publication has 27 references indexed in Scilit:
- An evaluation of statistical approaches to rare variant analysis in genetic association studiesGenetic Epidemiology, 2009
- Marker selection for genetic case–control association studiesNature Protocols, 2009
- Ulcerative colitis–risk loci on chromosomes 1p36 and 12q15 found by genome-wide association studyNature Genetics, 2009
- Long-Range LD Can Confound Genome Scans in Admixed PopulationsAmerican Journal of Human Genetics, 2008
- Genetic determinants of ulcerative colitis include the ECM1 locus and five loci implicated in Crohn's diseaseNature Genetics, 2008
- A new multipoint method for genome-wide association studies by imputation of genotypesNature Genetics, 2007
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature, 2007
- Principal components analysis corrects for stratification in genome-wide association studiesNature Genetics, 2006
- A haplotype map of the human genomeNature, 2005
- The International HapMap ProjectNature, 2003