BART: Bayesian additive regression trees
Top Cited Papers
Open Access
- 1 March 2010
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Applied Statistics
- Vol. 4 (1), 266-298
- https://doi.org/10.1214/09-aoas285
Abstract
We develop a Bayesian “sum-of-trees” model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors. By keeping track of predictor inclusion frequencies, BART can also be used for model-free variable selection. BART’s many features are illustrated with a bake-off against competing methods on 42 different data sets, with a simulation experiment and on a drug discovery classification problem.Keywords
All Related Versions
This publication has 23 references indexed in Scilit:
- Extracting sequence features to predict protein–DNA interactions: a comparative studyNucleic Acids Research, 2008
- Bayesian CART: Prior Specification and Posterior SimulationJournal of Computational and Graphical Statistics, 2007
- Predictive Toxicology: Benchmarking Molecular Descriptors and Statistical MethodsJournal of Chemical Information and Computer Sciences, 2003
- Bayesian backfitting (with comments and a rejoinder by the authorsStatistical Science, 2000
- Bayesian Inference on Network Traffic Using Link Count DataJournal of the American Statistical Association, 1998
- A Decision-Theoretic Generalization of On-Line Learning and an Application to BoostingJournal of Computer and System Sciences, 1997
- Reversible jump Markov chain Monte Carlo computation and Bayesian model determinationBiometrika, 1995
- Bayesian Analysis of Binary and Polychotomous Response DataJournal of the American Statistical Association, 1993
- Multivariate Adaptive Regression SplinesThe Annals of Statistics, 1991
- An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation BiasJournal of the American Statistical Association, 1962