Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments

Top Cited Papers

Open Access

18 February 2010

journal article
research article
Published by Springer Nature in BMC Bioinformatics

Vol. 11 (1), 1-13
https://doi.org/10.1186/1471-2105-11-94

Abstract

High-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and complex datasets generated by the sequencers. We provide a detailed evaluation of statistical methods for normalization and differential expression (DE) analysis of Illumina transcriptome sequencing (mRNA-Seq) data. We compare statistical methods for detecting genes that are significantly DE between two types of biological samples and find that there are substantial differences in how the test statistics handle low-count genes. We evaluate how DE results are affected by features of the sequencing platform, such as, varying gene lengths, base-calling calibration method (with and without phi X control lane), and flow-cell/library preparation effects. We investigate the impact of the read count normalization method on DE results and show that the standard approach of scaling by total lane counts (e.g., RPKM) can bias estimates of DE. We propose more general quantile-based normalization procedures and demonstrate an improvement in DE detection. Our results have significant practical and methodological implications for the design and analysis of mRNA-Seq experiments. They highlight the importance of appropriate statistical methods for normalization and DE inference, to account for features of the sequencing platform that could impact the accuracy of results. They also reveal the need for further research in the development of statistical and computational methods for mRNA-Seq.

Keywords

This publication has 19 references indexed in Scilit:

Determination of tag density required for digital transcriptome analysis: Application to an androgen-sensitive prostate cancer model
Proceedings of the National Academy of Sciences, 2008
High-resolution mapping of copy-number alterations with massively parallel sequencing
Nature Methods, 2008
Alternative isoform regulation in human tissue transcriptomes
Nature, 2008
Accurate whole human genome sequencing using reversible terminator chemistry
Nature, 2008
Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms
Nucleic Acids Research, 2008
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
Nucleic Acids Research, 2008
RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays
Genome Research, 2008
Mapping and quantifying mammalian transcriptomes by RNA-Seq
Nature Methods, 2008
The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements
Nature Biotechnology, 2006
Evaluation of DNA microarray results with quantitative gene expression platforms
Nature Biotechnology, 2006

Cited by 1404 articles