Enrichment or depletion of a GO category within a class of genes: which test?
Top Cited Papers
Open Access
- 20 December 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (4), 401-407
- https://doi.org/10.1093/bioinformatics/btl633
Abstract
Motivation: A number of available program packages determine the significant enrichments and/or depletions of GO categories among a class of genes of interest. Whereas a correct formulation of the problem leads to a single exact null distribution, these GO tools use a large variety of statistical tests whose denominations often do not clarify the underlying P-value computations. Summary: We review the different formulations of the problem and the tests they lead to: the binomial, χ2, equality of two probabilities, Fisher's exact and hypergeometric tests. We clarify the relationships existing between these tests, in particular the equivalence between the hypergeometric test and Fisher's exact test. We recall that the other tests are valid only for large samples, the test of equality of two probabilities and the χ2-test being equivalent. We discuss the appropriateness of one- and two-sided P-values, as well as some discreteness and conservatism issues. Contact:isabelle.rivals@espci.fr Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 31 references indexed in Scilit:
- Ontological analysis of gene expression data: current tools, limitations, and open problemsBioinformatics, 2005
- BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological NetworksBioinformatics, 2005
- OntologyTraverser: an R package for GO analysisBioinformatics, 2004
- THEA: ontology-driven analysis of microarray dataBioinformatics, 2004
- NetAffx Gene Ontology Mining Tool: a visual approach for microarray data analysisBioinformatics, 2004
- GOstat: find statistically overrepresented Gene Ontologies within a group of genesBioinformatics, 2004
- CLENCH: a program for calculating Cluster ENriCHment using the Gene OntologyBioinformatics, 2004
- Two-Sided P-Values from Discrete Asymmetric Distributions Based on Uniformly Most Powerful Unbiased TestsJournal of the Royal Statistical Society: Series D (The Statistician), 1996
- Test of Significance for 2 × 2 Contingency TablesJournal of the Royal Statistical Society. Series A (General), 1984
- P-Values: Interpretation and MethodologyThe American Statistician, 1975