Quantifying the Percent Increase in Minimum Sample Size for SNP Genotyping Errors in Genetic Model-Based Association Studies

Abstract
Kang et al. [Genet Epidemiol 2004;26:132-141] addressed the question of which genotype misclassification errors are most costly, in terms of minimum percentage increase in sample size necessary (%MSSN) to maintain constant asymptotic power and significance level, when performing case/control studies of genetic association in a genetic model-free setting. They answered the question for single nucleotide polymorphisms (SNPs) using the 2 x 3 chi2 test of independence. We address the same question here for a genetic model-based framework. The genetic model parameters considered are: disease model (dominant, recessive), genotypic relative risk, SNP (marker) and disease allele frequency, and linkage disequilibrium. %MSSN coefficients of each of the six possible error rates are determined by expanding the non-centrality parameter of the asymptotic distribution of the 2 x 3 chi2 test under a specified alternative hypothesis to approximate %MSSN using a linear Taylor series in the error rates. In this work we assume errors misclassifying one homozygote as another homozygote are 0, since these errors are thought to rarely occur in practice. Our findings are that there are settings of the genetic model parameters that lead to large total %MSSN for both dominant and recessive models. As SNP minor allele approaches 0, total %MSSN increases without bound, independent of other genetic model parameters. In general, %MSSN is a complex function of the genetic model parameters. Use of SNPs with small minor allele frequency requires careful attention to frequency of genotyping errors to insure that power specifications are met. Software to perform these calculations for study design is available, and an example of its use to study a disease is given.