Differential expression in SAGE: accounting for normal between-library variation

Top Cited Papers
Open Access
Abstract
Motivation: In contrasting levels of gene expression between groups of SAGE libraries, the libraries within each group are often combined and the counts for the tag of interest summed, and inference is made on the basis of these larger ‘pseudolibraries’. While this captures the sampling variability inherent in the procedure, it fails to allow for normal variation in levels of the gene between individuals within the same group, and can consequently overstate the significance of the results. The effect is not slight: between-library variation can be hundreds of times the within-library variation. Results: We introduce a beta-binomial sampling model that correctly incorporates both sources of variation. We show how to fit the parameters of this model, and introduce a test statistic for differential expression similar to a two-sample t-test. Contact: kabagg@mdanderson.org Supplementary information http://bioinformatics.mdanderson.org/ Includes Matlab and R code for fitting the model.