A Pathway-Based Kernel Boosting Method for Sample Classification Using Genomic Data
Open Access
- 30 August 2019
- Vol. 10 (9), 670
- https://doi.org/10.3390/genes10090670
Abstract
The analysis of cancer genomic data has long suffered “the curse of dimensionality.” Sample sizes for most cancer genomic studies are a few hundreds at most while there are tens of thousands of genomic features studied. Various methods have been proposed to leverage prior biological knowledge, such as pathways, to more effectively analyze cancer genomic data. Most of the methods focus on testing marginal significance of the associations between pathways and clinical phenotypes. They can identify informative pathways but do not involve predictive modeling. In this article, we propose a Pathway-based Kernel Boosting (PKB) method for integrating gene pathway information for sample classification, where we use kernel functions calculated from each pathway as base learners and learn the weights through iterative optimization of the classification loss function. We apply PKB and several competing methods to three cancer studies with pathological and clinical information, including tumor grade, stage, tumor sites and metastasis status. Our results show that PKB outperforms other methods and identifies pathways relevant to the outcome variables.Keywords
Funding Information
- National Institutes of Health (P01 CA154295, P50 CA196530)
This publication has 33 references indexed in Scilit:
- Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association TestAmerican Journal of Human Genetics, 2011
- Cell aggregation induces phosphorylation of PECAM-1 and Pyk2 and promotes tumor cell anchorage-independent growthMolecular Cancer, 2010
- Kernel dimension reduction in regressionThe Annals of Statistics, 2009
- PID: the Pathway Interaction DatabaseNucleic Acids Research, 2008
- Calcium and cancer: targeting Ca2+ transportNature Reviews Cancer, 2007
- Semiparametric Regression of Multidimensional Genetic Pathway Data: Least‐Squares Kernel Machines and Linear Mixed ModelsBiometrics, 2007
- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences of the United States of America, 2005
- Calcium-Binding Proteins S100A8 and S100A9 as Novel Diagnostic Markers in Human Prostate CancerClinical Cancer Research, 2005
- Mapping complex disease loci in whole-genome association studiesNature, 2004
- KEGG: Kyoto Encyclopedia of Genes and GenomesNucleic Acids Research, 2000