Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data
Top Cited Papers
Open Access
- 27 November 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 33 (20), e175
- https://doi.org/10.1093/nar/gni179
Abstract
Genome-wide expression profiling is a powerful tool for implicating novel gene ensembles in cellular mechanisms of health and disease. The most popular platform for genome-wide expression profiling is the Affymetrix GeneChip. However, its selection of probes relied on earlier genome and transcriptome annotation which is significantly different from current knowledge. The resultant informatics problems have a profound impact on analysis and interpretation the data. Here, we address these critical issues and offer a solution. We identified several classes of problems at the individual probe level in the existing annotation, under the assumption that current genome and transcriptome databases are more accurate than those used for GeneChip design. We then reorganized probes on more than a dozen popular GeneChips into gene-, transcript- and exon-specific probe sets in light of up-to-date genome, cDNA/EST clustering and single nucleotide polymorphism information. Comparing analysis results between the original and the redefined probe sets reveals approximately 30-50% discrepancy in the genes previously identified as differentially expressed, regardless of analysis method. Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses may be significantly flawed. It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions.Keywords
This publication has 21 references indexed in Scilit:
- A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 arrayNucleic Acids Research, 2005
- Detecting false expression signals in high-density oligonucleotide arrays by an in silico approachGenomics, 2004
- Finishing the euchromatic sequence of the human genomeNature, 2004
- Alternative mapping of probes to genes for Affymetrix chipsBMC Bioinformatics, 2004
- Exploration, normalization, and summaries of high density oligonucleotide array probe level dataBiostatistics, 2003
- Summaries of Affymetrix GeneChip probe level dataNucleic Acids Research, 2003
- A comparison of normalization methods for high density oligonucleotide array data based on variance and biasBioinformatics, 2003
- Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAsNature, 2002
- Initial sequencing and analysis of the human genomeNature, 2001
- Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detectionProceedings of the National Academy of Sciences, 2000