Identification of differential gene pathways with principal component analysis

Open Access

17 February 2009

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 25 (7), 882-889
https://doi.org/10.1093/bioinformatics/btp085

Abstract

Motivation: Development of high-throughput technology makes it possible to measure expressions of thousands of genes simultaneously. Genes have the inherent pathway structure, where pathways are composed of multiple genes with coordinated biological functions. It is of great interest to identify differential gene pathways that are associated with the variations of phenotypes. Results: We propose the following approach for detecting differential gene pathways. First, we construct gene pathways using databases such as KEGG or GO. Second, for each pathway, we extract a small number of representative features, which are linear combinations of gene expressions and/or their transformations. Specifically, we propose using (i) principal components (PCs) of gene expression sets, (ii) PCs of expanded gene expression sets and (iii) expanded sets of PCs of gene expressions, as the representative features. Third, we identify differential gene pathways as those with representative features significantly associated with the variations of phenotypes, particularly disease clinical outcomes, in regression models. The false discovery rate approach is used to adjust for multiple comparisons. Analysis of three gene expression datasets suggests that (i) the proposed approach can effectively identify differential gene pathways; (ii) PCs that explain only a small amount of variations of gene expressions may bear significant associations between gene pathways and phenotypes; (iii) including second-order terms of gene expressions may lead to identification of new differential gene pathways; (iv) the proposed approach is relatively insensitive to additional noises; and (v) the proposed approach can identify gene pathways missed by alternative approaches. Contact:shuangge.ma@yale.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

This publication has 32 references indexed in Scilit:

A general modular framework for gene set enrichment analysis
BMC Bioinformatics, 2009
Gene set analyses for interpreting microarray experiments on prokaryotic organisms
BMC Bioinformatics, 2008
Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes
Bioinformatics, 2008
Building pathway clusters from Random Forests classification using class votes
BMC Bioinformatics, 2008
Gene-set approach for expression pattern analysis
Briefings in Bioinformatics, 2008
Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis
Bioinformatics, 2007
Analyzing gene expression data in terms of gene sets: methodological issues
Bioinformatics, 2007
A multivariate approach for integrating genome-wide expression data and biological knowledge
Bioinformatics, 2006
Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
Proceedings of the National Academy of Sciences, 2005
The control of the false discovery rate in multiple testing under dependency
The Annals of Statistics, 2001

Cited by 66 articles