Gene selection using support vector machines with non-convex penalty
- 25 October 2005
- journal article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (1), 88-95
- https://doi.org/10.1093/bioinformatics/bti736
Abstract
With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of 'high-dimensional low sample size'. Therefore, robust and accurate gene selection methods are required to identify differentially expressed group of genes across different samples, e.g. between cancerous and normal cells. Successful gene selection will help to classify different cancer types, lead to a better understanding of genetic signatures in cancers and improve treatment strategies. Although gene selection and cancer classification are two closely related problems, most existing approaches handle them separately by selecting genes prior to classification. We provide a unified procedure for simultaneous gene selection and cancer classification, achieving high accuracy in both aspects. In this paper we develop a novel type of regularization in support vector machines (SVMs) to identify important genes for cancer classification. A special nonconvex penalty, called the smoothly clipped absolute deviation penalty, is imposed on the hinge loss function in the SVM. By systematically thresholding small estimates to zeros, the new procedure eliminates redundant genes automatically and yields a compact and accurate classifier. A successive quadratic algorithm is proposed to convert the non-differentiable and non-convex optimization problem into easily solved linear equation systems. The method is applied to two real datasets and has produced very promising results. MATLAB codes are available upon request from the authors.Keywords
This publication has 24 references indexed in Scilit:
- Geometric Representation of High Dimension, Low Sample Size DataJournal of the Royal Statistical Society Series B: Statistical Methodology, 2005
- Gene selection using a two-level hierarchical Bayesian modelBioinformatics, 2004
- A Feature Selection Newton Method for Support Vector Machine ClassificationComputational Optimization and Applications, 2004
- A spline function approach for detecting differentially expressed genes in microarray data analysisBioinformatics, 2004
- Breast cancer classification and prognosis based on gene expression profiles from a population-based studyProceedings of the National Academy of Sciences, 2003
- From measurements of metabolites to metabolomics: an ‘on the fly’ perspective illustrated by recent studies of carbon–nitrogen interactionsCurrent Opinion in Biotechnology, 2003
- Gene expression profiling predicts clinical outcome of breast cancerNature, 2002
- Variable Selection via Nonconcave Penalized Likelihood and its Oracle PropertiesJournal of the American Statistical Association, 2001
- An Efficient and Robust Statistical Modeling Approach to Discover Differentially Expressed Genes Using Genomic Expression ProfilesGenome Research, 2001
- 10.1162/153244303322753706Applied Physics Letters, 2000