Supervised harvesting of expression trees
Open Access
- 10 January 2001
- journal article
- Published by Springer Nature in Genome Biology
Abstract
We propose a new method for supervised learning from gene expression data. We call it 'tree harvesting'. This technique starts with a hierarchical clustering of genes, then models the outcome variable as a sum of the average expression profiles of chosen clusters and their products. It can be applied to many different kinds of outcome measures such as censored survival times, or a response falling in two or more classes (for example, cancer classes). The method can discover genes that have strong effects on their own, and genes that interact with other genes. We illustrate the method on data from a lymphoma study, and on a dataset containing samples from eight different cancers. It identified some potentially interesting gene clusters. In simulation studies we found that the procedure may require a large number of experimental samples to successfully discover interactions. Tree harvesting is a potentially useful tool for exploration of gene expression data and identification of interesting clusters of genes worthy of further investigation.Keywords
This publication has 9 references indexed in Scilit:
- 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patternsGenome Biology, 2000
- Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)The Annals of Statistics, 2000
- A gene expression database for the molecular pharmacology of cancerNature Genetics, 2000
- Systematic variation in gene expression patterns in human cancer cell linesNature Genetics, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiationProceedings of the National Academy of Sciences, 1999
- Cluster analysis and display of genome-wide expression patternsProceedings of the National Academy of Sciences, 1998
- Multivariate Adaptive Regression SplinesThe Annals of Statistics, 1991
- Regression Models and Life-TablesJournal of the Royal Statistical Society Series B: Statistical Methodology, 1972