Generalized linear modeling with regularization for detecting common disease rare haplotype association
- 21 November 2008
- journal article
- research article
- Published by Wiley in Genetic Epidemiology
- Vol. 33 (4), 308-316
- https://doi.org/10.1002/gepi.20382
Abstract
Whole genome association studies (WGAS) have surged in popularity in recent years as technological advances have made large‐scale genotyping more feasible and as new exciting results offer tremendous hope and optimism. The logic of WGAS rests upon the common disease/common variant (CD/CV) hypothesis. Detection of association under the common disease/rare variant (CD/RV) scenario is much harder, and the current practices of WGAS may be under‐power without large enough sample sizes. In this article, we propose a generalized linear model with regularization (rGLM) approach for detecting disease‐haplotype association using unphased single nucleotide polymorphisms data that is applicable to both CD/CV and CD/RV scenarios. We borrow a dimension‐reduction method from the data mining and statistical learning literature, but use it for the purpose of weeding out haplotypes that are not associated with the disease so that the associated haplotypes, especially those that are rare, can stand out and be accounted for more precisely. By using high‐dimensional data analysis techniques, which are frequently employed in microarray analyses, interacting effects among haplotypes in different blocks can be investigated without much concern about the sample size being overwhelmed by the number of haplotype combinations. Our simulation study demonstrates the gain in power for detecting associations with moderate sample sizes. For detecting association under CD/RV, regression type methods such as that implemented in hapassoc may fail to provide coefficient estimates for rare associated haplotypes, resulting in a loss of power compared to rGLM. Furthermore, our results indicate that rGLM can uncover the associated variants much more frequently than can hapassoc. Genet. Epidemiol . 2009.Keywords
This publication has 34 references indexed in Scilit:
- Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1Nature Genetics, 2008
- When the smoke clears ...Nature, 2008
- hapassoc: Software for Likelihood Inference of Trait Associations with SNP Haplotypes and Other AttributesJournal of Statistical Software, 2006
- A Note on Inference of Trait Associations with SNP Haplotypes and Other Attributes in Generalized Linear ModelsHuman Heredity, 2004
- A Powerful Strategy to Account for Multiple Testing in the Context of Haplotype AnalysisAmerican Journal of Human Genetics, 2004
- Linkage Disequilibrium Mapping via Cladistic Analysis of Single-Nucleotide Polymorphism HaplotypesAmerican Journal of Human Genetics, 2004
- Inference on Haplotype Effects in Case-Control Studies Using Unphased Genotype DataAmerican Journal of Human Genetics, 2003
- Variable Selection via Nonconcave Penalized Likelihood and its Oracle PropertiesJournal of the American Statistical Association, 2001
- Haplotypes vs single marker linkage disequilibrium tests: what do we gain?European Journal of Human Genetics, 2001
- Genetic Analysis of Case/Control Data Using Estimated Haplotype Frequencies: Application to APOE Locus Variation and Alzheimer's DiseaseGenome Research, 2001