Regression Approaches for Microarray Data Analysis
- 1 December 2003
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 10 (6), 961-980
- https://doi.org/10.1089/106652703322756177
Abstract
A variety of new procedures have been devised to handle the two-sample comparison (e.g., tumor versus normal tissue) of gene expression values as measured with microarrays. Such new methods are required in part because of some defining characteristics of microarray-based studies: (i) the very large number of genes contributing expression measures which far exceeds the number of samples (observations) available and (ii) the fact that by virtue of pathway/network relationships, the gene expression measures tend to be highly correlated. These concerns are exacerbated in the regression setting, where the objective is to relate gene expression, simultaneously for multiple genes, to some external outcome or phenotype. Correspondingly, several methods have been recently proposed for addressing these issues. We briefly critique some of these methods prior to a detailed evaluation of gene harvesting. This reveals that gene harvesting, without additional constraints, can yield artifactual solutions. Results obtained employing such constraints motivate the use of regularized regression procedures such as the lasso, least angle regression, and support vector machines. Model selection and solution multiplicity issues are also discussed. The methods are evaluated using a microarray-based study of cardiomyopathy in transgenic mice.Keywords
This publication has 31 references indexed in Scilit:
- SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivationNature Genetics, 2008
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Strong Feature Sets from Small SamplesJournal of Computational Biology, 2002
- Improved Background Correction for Spotted DNA MicroarraysJournal of Computational Biology, 2002
- Unfolding of Microarray DataJournal of Computational Biology, 2001
- Significance analysis of microarrays applied to the ionizing radiation responseProceedings of the National Academy of Sciences, 2001
- Multivariate Adaptive Regression SplinesThe Annals of Statistics, 1991
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978
- Some Comments on C PTechnometrics, 1973
- Ridge Regression: Biased Estimation for Nonorthogonal ProblemsTechnometrics, 1970