Correlation test to assess low-level processing of high-density oligonucleotide microarray data
Open Access
- 31 March 2005
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 6 (1), 80
- https://doi.org/10.1186/1471-2105-6-80
Abstract
Background: There are currently a number of competing techniques for low-level processing of oligonucleotide array data. The choice of technique has a profound effect on subsequent statistical analyses, but there is no method to assess whether a particular technique is appropriate for a specific data set, without reference to external data. Results: We analyzed coregulation between genes in order to detect insufficient normalization between arrays, where coregulation is measured in terms of statistical correlation. In a large collection of genes, a random pair of genes should have on average zero correlation, hence allowing a correlation test. For all data sets that we evaluated, and the three most commonly used low-level processing procedures including MAS5, RMA and MBEI, the housekeeping-gene normalization failed the test. For a real clinical data set, RMA and MBEI showed significant correlation for absent genes. We also found that a second round of normalization on the probe set level improved normalization significantly throughout. Conclusion: Previous evaluation of low-level processing in the literature has been limited to artificial spike-in and mixture data sets. In the absence of a known gold-standard, the correlation criterion allows us to assess the appropriateness of low-level processing of a specific data set and the success of normalization for subsets of genes.Keywords
This publication has 16 references indexed in Scilit:
- Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control datasetGenome Biology, 2005
- Bioconductor: open software development for computational biology and bioinformaticsGenome Biology, 2004
- A benchmark for Affymetrix GeneChip expression measuresBioinformatics, 2004
- Exploration, normalization, and summaries of high density oligonucleotide array probe level dataBiostatistics, 2003
- Comparisons and validation of statistical clustering techniques for microarray gene expression dataBioinformatics, 2003
- Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variationNucleic Acids Research, 2002
- Computational analysis of microarray dataNature Reviews Genetics, 2001
- Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detectionProceedings of the National Academy of Sciences, 2000
- Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detectionProceedings of the National Academy of Sciences, 2000
- High density synthetic oligonucleotide arraysNature Genetics, 1999