Bayesian CART: Prior Specification and Posterior Simulation

Abstract
We present advances in Bayesian modeling and computation for CART (classification and regression tree) models. The modeling innovations include a formal prior distributional structure for tree generation—the pinball prior—that allows for the combination of an explicit specification of a distribution for both the tree size and the tree shape. The core computational innovations involve a novel Metropolis–Hastings method that can dramatically improve the convergence and mixing properties of MCMC methods of Bayesian CART analysis. Earlier MCMC methods have simulated Bayesian CART models using very local MCMC moves, proposing only small changes to a “current” CART model. Our new Metropolis–Hastings move makes large changes in the CART tree, but is at the same time local in that it leaves unchanged the partition of observations into terminal nodes. We evaluate the effectiveness of the proposed algorithm in two examples, one with a constructed data set and one concerning analysis of a published breast cancer dataset.