Efficient and flexible Integration of variant characteristics in rare variant association studies using integrated nested Laplace approximation
Open Access
- 19 February 2021
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 17 (2), e1007784
- https://doi.org/10.1371/journal.pcbi.1007784
Abstract
Rare variants are thought to play an important role in the etiology of complex diseases and may explain a significant fraction of the missing heritability in genetic disease studies. Next-generation sequencing facilitates the association of rare variants in coding or regulatory regions with complex diseases in large cohorts at genome-wide scale. However, rare variant association studies (RVAS) still lack power when cohorts are small to medium-sized and if genetic variation explains a small fraction of phenotypic variance. Here we present a novel Bayesian rare variant Association Test using Integrated Nested Laplace Approximation (BATI). Unlike existing RVAS tests, BATI allows integration of individual or variant-specific features as covariates, while efficiently performing inference based on full model estimation. We demonstrate that BATI outperforms established RVAS methods on realistic, semi-synthetic whole-exome sequencing cohorts, especially when using meaningful biological context, such as functional annotation. We show that BATI achieves power above 70% in scenarios in which competing tests fail to identify risk genes, e.g. when risk variants in sum explain less than 0.5% of phenotypic variance. We have integrated BATI, together with five existing RVAS tests in the ‘Rare Variant Genome Wide Association Study’ (rvGWAS) framework for data analyzed by whole-exome or whole genome sequencing. rvGWAS supports rare variant association for genes or any other biological unit such as promoters, while allowing the analysis of essential functionalities like quality control or filtering. Applying rvGWAS to a Chronic Lymphocytic Leukemia study we identified eight candidate predisposition genes, including EHMT2 and COPS7A. Complex diseases are characterized by being related to genetic factors and environmental factors such as air pollution, diet etc. that together define the susceptibility of each individual to develop a given disease. Much effort has been applied to advance the knowledge of the genetic bases of such diseases, specially in the discovery of frequent genetic variants in the population increasing disease risk. However, these variants usually explain a little part of the etiology of such diseases. Previous studies have shown that rare variants, i.e. variants present in less than 1% of the population, may explain the rest of the variability related to genetic aspects of the disease. Genome sequencing offers the opportunity to discover rare variants, but powerful statistical methods are needed to discriminate those variants that induce susceptibility to the disease. Here we have developed a powerful and flexible statistical approach for the detection of rare variants associated with a disease and we have integrated it into a computer tool that is easy and intuitive for the researchers and clinicians to use. We have shown that our approach outperformed other common statistical methods specially in a situation where these variants explain just a small part of the disease. The discovery of these rare variants will contribute to the knowledge of the molecular mechanism of complex diseases.Keywords
This publication has 42 references indexed in Scilit:
- Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human ExomesScience, 2012
- An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 PeopleScience, 2012
- Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association TestAmerican Journal of Human Genetics, 2011
- A map of human genome variation from population-scale sequencingNature, 2010
- Pooled Association Tests for Rare Variants in Exon-Resequencing StudiesAmerican Journal of Human Genetics, 2010
- International network of cancer genome projectsNature, 2010
- A Groupwise Association Test for Rare Mutations Using a Weighted Sum StatisticPLoS Genetics, 2009
- Methods for Detecting Associations with Rare Variants for Common Diseases: Application to Analysis of Sequence DataAmerican Journal of Human Genetics, 2008
- Semiparametric inference for a general class of models for recurrent eventsJournal of Statistical Planning and Inference, 2007
- Bayesian Measures of Model Complexity and FitJournal of the Royal Statistical Society Series B: Statistical Methodology, 2002