Generalized random set framework for functional enrichment analysis using primary genomics datasets
Open Access
- 22 October 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 27 (1), 70-77
- https://doi.org/10.1093/bioinformatics/btq593
Abstract
Motivation: Functional enrichment analysis using primary genomics datasets is an emerging approach to complement established methods for functional enrichment based on predefined lists of functionally related genes. Currently used methods depend on creating lists of ‘significant’ and ‘non-significant’ genes based on ad hoc significance cutoffs. This can lead to loss of statistical power and can introduce biases affecting the interpretation of experimental results. Results: We developed and validated a new statistical framework, generalized random set (GRS) analysis, for comparing the genomic signatures in two datasets without the need for gene categorization. In our tests, GRS produced correct measures of statistical significance, and it showed dramatic improvement in the statistical power over other methods currently used in this setting. We also developed a procedure for identifying genes driving the concordance of the genomics profiles and demonstrated a dramatic improvement in functional coherence of genes identified in such analysis. Availability: GRS can be downloaded as part of the R package CLEAN from http://ClusterAnalysis.org/. An online implementation is available at http://GenomicsPortals.org/. Contact:mario.medvedovic@uc.edu Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 38 references indexed in Scilit:
- Probabilistic retrieval and visualization of biologically relevant microarray experimentsBioinformatics, 2009
- A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divideBioinformatics, 2009
- ArrayExpress update--from an archive of functional genomics experiments to the atlas of gene expressionNucleic Acids Research, 2009
- NCBI GEO: archive for high-throughput functional genomic dataNucleic Acids Research, 2009
- LRpath: a logistic regression approach for identifying enriched biological groups in gene expression dataBioinformatics, 2008
- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences, 2005
- An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survivalProceedings of the National Academy of Sciences, 2005
- Integrative analysis of the cancer transcriptomeNature Genetics, 2005
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003
- KEGG: Kyoto Encyclopedia of Genes and GenomesNucleic Acids Research, 2000