A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divide
Open Access
- 15 May 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 25 (13), 1694-1701
- https://doi.org/10.1093/bioinformatics/btp290
Abstract
Motivation: Approximately 9334 (37%) of Human genes have no publications documenting their function and, for those that are published, the number of publications per gene is highly skewed. Furthermore, for reasons not clear, the entry of new gene names into the literature has slowed in recent years. If we are to better understand human/mammalian biology and complete the catalog of human gene function, it is important to finish predicting putative functions for these genes based upon existing experimental evidence. Results: A global meta-analysis (GMA) of all publicly available GEO two-channel human microarray datasets (3551 experiments total) was conducted to identify genes with recurrent, reproducible patterns of co-regulation across different conditions. Patterns of co-expression were divided into parallel (i.e. genes are up and down-regulated together) and anti-parallel. Several ranking methods to predict a gene's function based on its top 20 co-expressed gene pairs were compared. In the best method, 34% of predicted Gene Ontology (GO) categories matched exactly with the known GO categories for ∼5000 genes analyzed versus only 3% for random gene sets. Only 2.4% of co-expressed gene pairs were found as co-occurring gene pairs in MEDLINE. Conclusions: Via a GO enrichment analysis, genes co-expressed in parallel with the query gene were frequently associated with the same GO categories, whereas anti-parallel genes were not. Combining parallel and anti-parallel genes for analysis resulted in fewer significant GO categories, suggesting they are best analyzed separately. Expression databases contain much unexpected genetic knowledge that has not yet been reported in the literature. A total of 1642 Human genes with unknown function were differentially expressed in at least 30 experiments. Availability: Data matrix available upon request. Contact:jdwren@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 56 references indexed in Scilit:
- Cyclebase.org a comprehensive multi-organism online database of cell-cycle experimentsNucleic Acids Research, 2007
- A Latent Variable Approach for Meta-Analysis of Gene Expression Data from Multiple Microarray ExperimentsBMC Bioinformatics, 2007
- Meta-analysis of microarray results: challenges, opportunities, and recommendations for standardizationGene, 2007
- Meta-analysis of several gene lists for distinct types of cancer: A simple way to reveal common prognostic markersBMC Bioinformatics, 2007
- Bayesian meta-analysis models for microarray data: a comparative studyBMC Bioinformatics, 2007
- Conservation and evolution of gene coexpression networks in human and chimpanzee brainsProceedings of the National Academy of Sciences, 2006
- NCBI GEO: mining tens of millions of expression profiles--database and tools updateNucleic Acids Research, 2006
- Reverse engineering of regulatory networks in human B cellsNature Genetics, 2005
- Statistical issues and methods for meta-analysis of microarray data: a case study in prostate cancerFunctional & Integrative Genomics, 2003
- Initial sequencing and analysis of the human genomeNature, 2001