A robust statistical method for case-control association testing with copy number variation

Abstract
Matt Hurles and colleagues present a general statistical framework for copy number variation (CNV) association tests in a case-control study design. They show that existing strategies for CNV association with binary disease phenotypes are complicated by differential errors and poor clustering quality. Here they report new methods, robust to these factors, which apply likelihood ratio testing to constrained Gaussian mixture models of quantitative CNV signals in cases and controls. Their methods are assay and platform independent, and implemented in freely available CNVtools software. Copy number variation (CNV) is pervasive in the human genome and can play a causal role in genetic diseases. The functional impact of CNV cannot be fully captured through linkage disequilibrium with SNPs. These observations motivate the development of statistical methods for performing direct CNV association studies. We show through simulation that current tests for CNV association are prone to false-positive associations in the presence of differential errors between cases and controls, especially if quantitative CNV measurements are noisy. We present a statistical framework for performing case-control CNV association studies that applies likelihood ratio testing of quantitative CNV measurements in cases and controls. We show that our methods are robust to differential errors and noisy data and can achieve maximal theoretical power. We illustrate the power of these methods for testing for association with binary and quantitative traits, and have made this software available as the R package CNVtools.