DOI QR코드

DOI QR Code

A review of tree-based Bayesian methods

  • Linero, Antonio R. (Department of Statistics, Florida State University)
  • 투고 : 2017.10.07
  • 심사 : 2017.11.17
  • 발행 : 2017.11.30

초록

Tree-based regression and classification ensembles form a standard part of the data-science toolkit. Many commonly used methods take an algorithmic view, proposing greedy methods for constructing decision trees; examples include the classification and regression trees algorithm, boosted decision trees, and random forests. Recent history has seen a surge of interest in Bayesian techniques for constructing decision tree ensembles, with these methods frequently outperforming their algorithmic counterparts. The goal of this article is to survey the landscape surrounding Bayesian decision tree methods, and to discuss recent modeling and computational developments. We provide connections between Bayesian tree-based methods and existing machine learning techniques, and outline several recent theoretical developments establishing frequentist consistency and rates of convergence for the posterior distribution. The methodology we present is applicable for a wide variety of statistical tasks including regression, classification, modeling of count data, and many others. We illustrate the methodology on both simulated and real datasets.

키워드

참고문헌

  1. Albert JH and Chib S (1993). Bayesian analysis of binary and polychotomous response data, Journal of the American Statistical Association, 88, 669-679. https://doi.org/10.1080/01621459.1993.10476321
  2. Balog M, Lakshminarayanan B, Ghahramani Z, Roy DM, and Teh YW (2016). The Mondrian kernel. In Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence, New York, 32-41.
  3. Breiman L (1996). Bagging predictors, Machine Learning, 24, 123-140.
  4. Breiman L (2000). Some infinity theory for predictor ensembles (Technical Report No. 577). Department of Statistics, University of California, Berkeley.
  5. Breiman L (2001). Random forests, Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
  6. Breiman L, Friedman J, Stone CJ, and Olshen RA (1984). Classification and Regression Trees, Wadsworth International Group, Belmont.
  7. Cappe O, Godsill SJ, and Moulines E (2007). An overview of existing methods and recent advances in sequential Monte Carlo. Proceedings of the IEEE, 95, 899-924. https://doi.org/10.1109/JPROC.2007.893250
  8. Chipman H, George EI, Gramacy RB, and McCulloch R (2013). Bayesian treed response surface models, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3, 298-305. https://doi.org/10.1002/widm.1094
  9. Chipman HA, George EI, and McCulloch RE (1998). Bayesian CART model search, Journal of the American Statistical Association, 93, 935-948. https://doi.org/10.1080/01621459.1998.10473750
  10. Chipman HA, George EI, and McCulloch RE (2010). BART: Bayesian additive regression trees, The Annals of Applied Statistics, 4, 266-298. https://doi.org/10.1214/09-AOAS285
  11. Denison DG, Mallick BK, and Smith AF (1998). A Bayesian CART algorithm, Biometrika, 85, 363-377. https://doi.org/10.1093/biomet/85.2.363
  12. Efron B and Gong G (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation, The American Statistician, 37, 36-48.
  13. Freund Y, Schapire R, and Abe N (1999). A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence, 14, 771-780.
  14. Friedman JH (1991). Multivariate adaptive regression splines, The Annals of Statistics, 19, 1-67. https://doi.org/10.1214/aos/1176347963
  15. Gramacy RB and Lee HKH (2008). Bayesian treed Gaussian process models with an application to computer modeling, Journal of the American Statistical Association, 103, 1119-1130. https://doi.org/10.1198/016214508000000689
  16. Hastie T and Tibshirani R (1987). Generalized additive models: some applications, Journal of the American Statistical Association, 82, 371-386. https://doi.org/10.1080/01621459.1987.10478440
  17. Hastie T and Tibshirani R (2000). Bayesian backfitting (with comments and a rejoinder by the authors), Statistical Science, 15, 196-223. https://doi.org/10.1214/ss/1009212815
  18. Hastings WK (1970). Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57, 97-109. https://doi.org/10.1093/biomet/57.1.97
  19. Hill JL (2011). Bayesian nonparametric modeling for causal inference, Journal of Computational and Graphical Statistics, 20, 217-240. https://doi.org/10.1198/jcgs.2010.08162
  20. Irsoy O, Yildiz OT, and Alpaydin E (2012). Soft decision trees. In Proceedings of the 21st International Conference on Pattern Recognition, Tsukuba, Japan, 1879-1822.
  21. Kahle D and Wickham H (2013). ggmap: spatial visualization with ggplot2, The R Journal, 5, 144-161. https://doi.org/10.32614/RJ-2013-014
  22. Kass GV (1980). An exploratory technique for investigating large quantities of categorical data, Applied Statistics, 29, 119-127. https://doi.org/10.2307/2986296
  23. Lakshminarayanan B, Roy DM, and Teh YW (2013). Top-down particle filtering for Bayesian decision trees. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, 280-288.
  24. Lakshminarayanan B, Roy DM, and Teh YW (2014). Mondrian forests: efficient online random forests. In Advances in Neural Information Processing Systems, Montreal, Canada, 3140-3148.
  25. Lakshminarayanan B, Roy DM, Teh YW, and Unit G (2015). Particle Gibbs for Bayesian additive regression trees. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, San Diego, CA, 553-561.
  26. Lichman M (2013). UCI machine learning repository, Retrieved November 10, 2017, from: http://archive.ics.uci.edu/ml
  27. Linero AR (2016). Bayesian regression trees for high dimensional prediction and variable selection, Journal of the American Statistical Association, https://doi.org/10.1080/01621459.2016.1264957
  28. Linero AR and Yang Y (2017). Bayesian regression tree ensembles that adapt to smoothness and sparsity. arXiv preprint arXiv:1707.09461.
  29. Ma L (2017). Recursive partitioning and multi-scale modeling on conditional densities, Electronic Journal of Statistics, 11, 1297-1325. https://doi.org/10.1214/17-EJS1254
  30. MacEachern SN (2016). Nonparametric Bayesian methods: a gentle introduction and overview, Communications for Statistical Applications and Methods, 23, 445-466. https://doi.org/10.5351/CSAM.2016.23.6.445
  31. McCullagh P (1984). Generalized linear models, European Journal of Operational Research, 16, 285-292. https://doi.org/10.1016/0377-2217(84)90282-0
  32. Muliere P andWalker S (1997). A Bayesian non-parametric approach to survival analysis using Polya trees, Scandinavian Journal of Statistics, 24, 331-340. https://doi.org/10.1111/1467-9469.00067
  33. Murray JS (2017). Log-linear Bayesian additive regression trees for categorical and count responses. arXiv preprint arXiv:1701.01503.
  34. Neal RM (1995). Bayesian learning for neural networks (Doctoral dissertation), University of Toronto, Toronto.
  35. Nielsen D (2016). Tree boosting with XGBoost: why does XGBoost win "every" machine learning competition? (Master's thesis), Norwegian University of Science and Technology, Trondheim.
  36. Pratola MT (2016). Efficient Metropolis-Hastings proposal mechanisms for Bayesian regression tree models, Bayesian Analysis, 11, 885-911. https://doi.org/10.1214/16-BA999
  37. Quinlan JR (1993). C4. 5: Programs for Machine Learning, Morgan Kaufmann, San Mateo.
  38. Rasmussen CE andWilliams CKI (2006). Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning), MIT Press, Cambridge.
  39. Rockova V and van der Pas S (2017). Posterior concentration for Bayesian regression trees and their ensembles. arXiv preprint arXiv:1078.08734.
  40. Roy DM and Teh YW (2009). The Mondrian process, Advances in Neural Information Processing Systems, 21, 1377-1381.
  41. Scornet E (2016). Random forests and kernel methods, IEEE Transactions on Information Theory, 62, 1485-1500. https://doi.org/10.1109/TIT.2016.2514489
  42. Sparapani RA, Logan BR, McCulloch RE, and Laud PW (2016). Nonparametric survival analysis using Bayesian additive regression trees (BART), Statistics in Medicine, 35, 2741-2753. https://doi.org/10.1002/sim.6893
  43. Taddy MA, Gramacy RB, and Polson NG (2011). Dynamic trees for learning and design, Journal of the American Statistical Association, 106, 109-123. https://doi.org/10.1198/jasa.2011.ap09769
  44. Van der Vaart AW and Wellner JA (1996). Weak Convergence and Empirical Processes: with Applications to Statistics, Springer, New Work.
  45. Wand M (2014). SemiPar: Semiparametric Regression. R package (Version 1.0-4.1).
  46. Wu Y, Tjelmeland H, and West M (2007). Bayesian CART: prior specification and posterior simulation, Journal of Computational and Graphical Statistics, 16, 44-66. https://doi.org/10.1198/106186007X180426
  47. Yang Y and Tokdar ST (2015). Minimax-optimal nonparametric regression in high dimensions, The Annals of Statistics, 43, 652-674 https://doi.org/10.1214/14-AOS1289