[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.29220/CSAM.2017.24.6.543

A review of tree-based Bayesian methods

Linero, Antonio R. (Department of Statistics, Florida State University)

Publication Information

Communications for Statistical Applications and Methods / v.24, no.6, 2017 , pp. 543-559 More about this Journal

Abstract

Tree-based regression and classification ensembles form a standard part of the data-science toolkit. Many commonly used methods take an algorithmic view, proposing greedy methods for constructing decision trees; examples include the classification and regression trees algorithm, boosted decision trees, and random forests. Recent history has seen a surge of interest in Bayesian techniques for constructing decision tree ensembles, with these methods frequently outperforming their algorithmic counterparts. The goal of this article is to survey the landscape surrounding Bayesian decision tree methods, and to discuss recent modeling and computational developments. We provide connections between Bayesian tree-based methods and existing machine learning techniques, and outline several recent theoretical developments establishing frequentist consistency and rates of convergence for the posterior distribution. The methodology we present is applicable for a wide variety of statistical tasks including regression, classification, modeling of count data, and many others. We illustrate the methodology on both simulated and real datasets.

Keywords

Bayesian additive regression trees; boosting; random forests; semiparametric Bayes;

Citations & Related Records

Reference

1	Wand M (2014). SemiPar: Semiparametric Regression. R package (Version 1.0-4.1).
2	Wu Y, Tjelmeland H, and West M (2007). Bayesian CART: prior specification and posterior simulation, Journal of Computational and Graphical Statistics, 16, 44-66. DOI
3	Yang Y and Tokdar ST (2015). Minimax-optimal nonparametric regression in high dimensions, The Annals of Statistics, 43, 652-674 DOI
4	Breiman L, Friedman J, Stone CJ, and Olshen RA (1984). Classification and Regression Trees, Wadsworth International Group, Belmont.
5	Albert JH and Chib S (1993). Bayesian analysis of binary and polychotomous response data, Journal of the American Statistical Association, 88, 669-679. DOI
6	Balog M, Lakshminarayanan B, Ghahramani Z, Roy DM, and Teh YW (2016). The Mondrian kernel. In Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence, New York, 32-41.
7	Breiman L (1996). Bagging predictors, Machine Learning, 24, 123-140.
8	Breiman L (2000). Some infinity theory for predictor ensembles (Technical Report No. 577). Department of Statistics, University of California, Berkeley.
9	Breiman L (2001). Random forests, Machine Learning, 45, 5-32. DOI
10	Cappe O, Godsill SJ, and Moulines E (2007). An overview of existing methods and recent advances in sequential Monte Carlo. Proceedings of the IEEE, 95, 899-924. DOI
11	Chipman H, George EI, Gramacy RB, and McCulloch R (2013). Bayesian treed response surface models, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3, 298-305. DOI
12	Chipman HA, George EI, and McCulloch RE (1998). Bayesian CART model search, Journal of the American Statistical Association, 93, 935-948. DOI
13	Chipman HA, George EI, and McCulloch RE (2010). BART: Bayesian additive regression trees, The Annals of Applied Statistics, 4, 266-298. DOI
14	Denison DG, Mallick BK, and Smith AF (1998). A Bayesian CART algorithm, Biometrika, 85, 363-377. DOI
15	Efron B and Gong G (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation, The American Statistician, 37, 36-48.
16	Hastie T and Tibshirani R (1987). Generalized additive models: some applications, Journal of the American Statistical Association, 82, 371-386. DOI
17	Freund Y, Schapire R, and Abe N (1999). A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence, 14, 771-780.
18	Friedman JH (1991). Multivariate adaptive regression splines, The Annals of Statistics, 19, 1-67. DOI
19	Gramacy RB and Lee HKH (2008). Bayesian treed Gaussian process models with an application to computer modeling, Journal of the American Statistical Association, 103, 1119-1130. DOI
20	Hastie T and Tibshirani R (2000). Bayesian backfitting (with comments and a rejoinder by the authors), Statistical Science, 15, 196-223. DOI
21	Hastings WK (1970). Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57, 97-109. DOI
22	Hill JL (2011). Bayesian nonparametric modeling for causal inference, Journal of Computational and Graphical Statistics, 20, 217-240. DOI
23	Irsoy O, Yildiz OT, and Alpaydin E (2012). Soft decision trees. In Proceedings of the 21st International Conference on Pattern Recognition, Tsukuba, Japan, 1879-1822.
24	Kahle D and Wickham H (2013). ggmap: spatial visualization with ggplot2, The R Journal, 5, 144-161. DOI
25	Kass GV (1980). An exploratory technique for investigating large quantities of categorical data, Applied Statistics, 29, 119-127. DOI
26	Lakshminarayanan B, Roy DM, and Teh YW (2013). Top-down particle filtering for Bayesian decision trees. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, 280-288.
27	Linero AR (2016). Bayesian regression trees for high dimensional prediction and variable selection, Journal of the American Statistical Association, https://doi.org/10.1080/01621459.2016.1264957 DOI
28	Lakshminarayanan B, Roy DM, and Teh YW (2014). Mondrian forests: efficient online random forests. In Advances in Neural Information Processing Systems, Montreal, Canada, 3140-3148.
29	Lakshminarayanan B, Roy DM, Teh YW, and Unit G (2015). Particle Gibbs for Bayesian additive regression trees. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, San Diego, CA, 553-561.
30	Lichman M (2013). UCI machine learning repository, Retrieved November 10, 2017, from: http://archive.ics.uci.edu/ml
31	Linero AR and Yang Y (2017). Bayesian regression tree ensembles that adapt to smoothness and sparsity. arXiv preprint arXiv:1707.09461.
32	Ma L (2017). Recursive partitioning and multi-scale modeling on conditional densities, Electronic Journal of Statistics, 11, 1297-1325. DOI
33	MacEachern SN (2016). Nonparametric Bayesian methods: a gentle introduction and overview, Communications for Statistical Applications and Methods, 23, 445-466. DOI
34	McCullagh P (1984). Generalized linear models, European Journal of Operational Research, 16, 285-292. DOI
35	Muliere P andWalker S (1997). A Bayesian non-parametric approach to survival analysis using Polya trees, Scandinavian Journal of Statistics, 24, 331-340. DOI
36	Murray JS (2017). Log-linear Bayesian additive regression trees for categorical and count responses. arXiv preprint arXiv:1701.01503.
37	Quinlan JR (1993). C4. 5: Programs for Machine Learning, Morgan Kaufmann, San Mateo.
38	Neal RM (1995). Bayesian learning for neural networks (Doctoral dissertation), University of Toronto, Toronto.
39	Nielsen D (2016). Tree boosting with XGBoost: why does XGBoost win "every" machine learning competition? (Master's thesis), Norwegian University of Science and Technology, Trondheim.
40	Pratola MT (2016). Efficient Metropolis-Hastings proposal mechanisms for Bayesian regression tree models, Bayesian Analysis, 11, 885-911. DOI
41	Rasmussen CE andWilliams CKI (2006). Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning), MIT Press, Cambridge.
42	Rockova V and van der Pas S (2017). Posterior concentration for Bayesian regression trees and their ensembles. arXiv preprint arXiv:1078.08734.
43	Roy DM and Teh YW (2009). The Mondrian process, Advances in Neural Information Processing Systems, 21, 1377-1381.
44	Scornet E (2016). Random forests and kernel methods, IEEE Transactions on Information Theory, 62, 1485-1500. DOI
45	Sparapani RA, Logan BR, McCulloch RE, and Laud PW (2016). Nonparametric survival analysis using Bayesian additive regression trees (BART), Statistics in Medicine, 35, 2741-2753. DOI
46	Taddy MA, Gramacy RB, and Polson NG (2011). Dynamic trees for learning and design, Journal of the American Statistical Association, 106, 109-123. DOI
47	Van der Vaart AW and Wellner JA (1996). Weak Convergence and Empirical Processes: with Applications to Statistics, Springer, New Work.