Supervised normalization of microarrays

Open Access

31 March 2010

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 26 (10), 1308-1315
https://doi.org/10.1093/bioinformatics/btq118

Abstract

Motivation: A major challenge in utilizing microarray technologies to measure nucleic acid abundances is ‘normalization’, the goal of which is to separate biologically meaningful signal from other confounding sources of signal, often due to unavoidable technical factors. It is intuitively clear that true biological signal and confounding factors need to be simultaneously considered when performing normalization. However, the most popular normalization approaches do not utilize what is known about the study, both in terms of the biological variables of interest and the known technical factors in the study, such as batch or array processing date. Results: We show here that failing to include all study-specific biological and technical variables when performing normalization leads to biased downstream analyses. We propose a general normalization framework that fits a study-specific model employing every known variable that is relevant to the expression study. The proposed method is generally applicable to the full range of existing probe designs, as well as to both single-channel and dual-channel arrays. We show through real and simulated examples that the method has favorable operating characteristics in comparison to some of the most highly used normalization methods. Availability: An R package called snm implementing the methodology will be made available from Bioconductor (http://bioconductor.org). Contact:jstorey@princeton.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

This publication has 18 references indexed in Scilit:

A general framework for multiple testing dependence
Proceedings of the National Academy of Sciences, 2008
A statistical framework for the analysis of microarray probe-level data
The Annals of Applied Statistics, 2007
Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis
PLoS Genetics, 2007
Normalization of two-channel microarrays accounting for experimental design and intensity-dependent relationships
Genome Biology, 2007
Extracellular matrix gene expression in the developing mouse aorta
Published by Elsevier ,2005
A Model-Based Background Adjustment for Oligonucleotide Expression Arrays
Journal of the American Statistical Association, 2004
Normalization of microarray data using a spatial mixed model analysis which includes splines
Bioinformatics, 2004
Statistical significance for genomewide studies
Proceedings of the National Academy of Sciences, 2003
A Direct Approach to False Discovery Rates
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2002
Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models
Journal of Computational Biology, 2001

Cited by 109 articles