Further development of the case-only design for assessing gene-environment interaction: evaluation of and adjustment for bias

Abstract
Background The case-only study for investigating gene–environment interactions provides increased statistical efficiency over case-control analyses. This design has been criticized for being susceptible to bias arising from non-independence between the genetic and environmental factors in the population. Given that independence is critical to the validity of case-only estimates of interaction, researchers frequently use controls to evaluate whether the independence assumption is tenable, as advised in the literature. Our work investigates to what extent this approach is appropriate and how non-independence can be accounted for in case-only analyses. Methods We provide a formula in epidemiological terms that illustrates the relationship between the gene–environment association measured among controls and the gene–environment association in the source population. Using this formula, we conducted sensitivity analyses to describe the circumstances in which controls can be used as proxy for the source population when evaluating gene–environment independence. Lastly, we generated hypothetical cohort data to examine whether multivariable modelling approaches can be used to control for non-independence. Results Our sensitivity analyses show that controls should not be used to evaluate gene–environment independence in the population, even when the baseline risk of disease is low (i.e. 1%), and the interaction and independent effects are moderate (i.e. risk ratio = 2). When the factors are associated, it is possible to remove bias arising from non-independence using standard statistical multivariable techniques in case-only analyses. Conclusions Even when the disease risk is low, evaluation of gene–environment independence in controls does not provide a consistent test for bias in the case-only study. Given that control for non-independence is possible when the source of the non-independence can be conceptualized, the case-only design may still be a useful epidemiological tool for examining gene–environment interactions.