MOCCASIN: a method for correcting for known and unknown confounders in RNA splicing analysis
Open Access
- 7 June 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Communications
- Vol. 12 (1), 1-9
- https://doi.org/10.1038/s41467-021-23608-9
Abstract
The effects of confounding factors on gene expression analysis have been extensively studied following the introduction of high-throughput microarrays and subsequently RNA sequencing. In contrast, there is a lack of equivalent analysis and tools for RNA splicing. Here we first assess the effect of confounders on both expression and splicing quantifications in two large public RNA-Seq datasets (TARGET, ENCODE). We show quantification of splicing variations are affected at least as much as those of gene expression, revealing unwanted sources of variations in both datasets. Next, we develop MOCCASIN, a method to correct the effect of both known and unknown confounders on RNA splicing quantification and demonstrate MOCCASIN’s effectiveness on both synthetic and real data. Code, synthetic and corrected datasets are all made available as resources.Keywords
Funding Information
- U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (GM128096)
- Center for Strategic Scientific Initiatives, National Cancer Institute (CA232563)
This publication has 31 references indexed in Scilit:
- Drift and conservation of differential exon usage across tissues in primate speciesProceedings of the National Academy of Sciences, 2013
- GLiMMPS: Robust statistical model for regulatory variation of alternative splicing using RNA-seq dataGenome Biology, 2013
- Expression divergence measured by transcriptome sequencing of four yeast speciesBMC Genomics, 2011
- Using control genes to correct for unwanted variation in microarray dataBiostatistics, 2011
- RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genomeBMC Bioinformatics, 2011
- Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM)Bioinformatics, 2011
- Analysis and design of RNA sequencing experiments for identifying isoform regulationNature Methods, 2010
- Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experimentsBMC Bioinformatics, 2010
- Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable AnalysisPLoS Genetics, 2007
- Adjusting batch effects in microarray expression data using empirical Bayes methodsBiostatistics, 2006