Group additive regression models for genomic data analysis
Open Access
- 18 May 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Biostatistics
- Vol. 9 (1), 100-113
- https://doi.org/10.1093/biostatistics/kxm015
Abstract
One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group features that are related to the phenotypes. In addition, the prediction mean square errors are also smaller than the component-wise boosting procedure. We demonstrate the application of the methods to pathway-based analysis of microarray gene expression data of breast cancer. Results from analysis of a breast cancer microarray gene expression data set indicate that the pathways of metalloendopeptidases (MMPs) and MMP inhibitors, as well as cell proliferation, cell growth, and maintenance are important to breast cancer–specific survival.Keywords
This publication has 12 references indexed in Scilit:
- Nonparametric pathway-based regression models for analysis of genomic dataBiostatistics, 2006
- Averaged gene expressions for regressionBiostatistics, 2006
- Boosting for high-dimensional linear modelsThe Annals of Statistics, 2006
- Model selection and estimation in regression with grouped variablesJournal of the Royal Statistical Society Series B: Statistical Methodology, 2005
- Survival ensemblesBiostatistics, 2005
- An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survivalProceedings of the National Academy of Sciences, 2005
- The Future of Association Studies: Gene-Based Analysis and ReplicationAmerican Journal of Human Genetics, 2004
- MMP-2 Protein in Invasive Breast Cancer and the Impact of MMP-2/TIMP-2 Phenotype on Overall SurvivalBreast Cancer Research and Treatment, 2003
- Gene Ontology: tool for the unification of biologyNature Genetics, 2000
- The accelerated failure time model: A useful alternative to the cox regression model in survival analysisStatistics in Medicine, 1992