• 제목/요약/키워드: Bayesian Additive Regression Trees

검색결과 5건 처리시간 0.02초

Comparison of tree-based ensemble models for regression

  • Park, Sangho;Kim, Chanmin
    • Communications for Statistical Applications and Methods
    • /
    • 제29권5호
    • /
    • pp.561-589
    • /
    • 2022
  • When multiple classifications and regression trees are combined, tree-based ensemble models, such as random forest (RF) and Bayesian additive regression trees (BART), are produced. We compare the model structures and performances of various ensemble models for regression settings in this study. RF learns bootstrapped samples and selects a splitting variable from predictors gathered at each node. The BART model is specified as the sum of trees and is calculated using the Bayesian backfitting algorithm. Throughout the extensive simulation studies, the strengths and drawbacks of the two methods in the presence of missing data, high-dimensional data, or highly correlated data are investigated. In the presence of missing data, BART performs well in general, whereas RF provides adequate coverage. The BART outperforms in high dimensional, highly correlated data. However, in all of the scenarios considered, the RF has a shorter computation time. The performance of the two methods is also compared using two real data sets that represent the aforementioned situations, and the same conclusion is reached.

A review of tree-based Bayesian methods

  • Linero, Antonio R.
    • Communications for Statistical Applications and Methods
    • /
    • 제24권6호
    • /
    • pp.543-559
    • /
    • 2017
  • Tree-based regression and classification ensembles form a standard part of the data-science toolkit. Many commonly used methods take an algorithmic view, proposing greedy methods for constructing decision trees; examples include the classification and regression trees algorithm, boosted decision trees, and random forests. Recent history has seen a surge of interest in Bayesian techniques for constructing decision tree ensembles, with these methods frequently outperforming their algorithmic counterparts. The goal of this article is to survey the landscape surrounding Bayesian decision tree methods, and to discuss recent modeling and computational developments. We provide connections between Bayesian tree-based methods and existing machine learning techniques, and outline several recent theoretical developments establishing frequentist consistency and rates of convergence for the posterior distribution. The methodology we present is applicable for a wide variety of statistical tasks including regression, classification, modeling of count data, and many others. We illustrate the methodology on both simulated and real datasets.

Using Bayesian tree-based model integrated with genetic algorithm for streamflow forecasting in an urban basin

  • Nguyen, Duc Hai;Bae, Deg-Hyo
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2021년도 학술발표회
    • /
    • pp.140-140
    • /
    • 2021
  • Urban flood management is a crucial and challenging task, particularly in developed cities. Therefore, accurate prediction of urban flooding under heavy precipitation is critically important to address such a challenge. In recent years, machine learning techniques have received considerable attention for their strong learning ability and suitability for modeling complex and nonlinear hydrological processes. Moreover, a survey of the published literature finds that hybrid computational intelligent methods using nature-inspired algorithms have been increasingly employed to predict or simulate the streamflow with high reliability. The present study is aimed to propose a novel approach, an ensemble tree, Bayesian Additive Regression Trees (BART) model incorporating a nature-inspired algorithm to predict hourly multi-step ahead streamflow. For this reason, a hybrid intelligent model was developed, namely GA-BART, containing BART model integrating with Genetic algorithm (GA). The Jungrang urban basin located in Seoul, South Korea, was selected as a case study for the purpose. A database was established based on 39 heavy rainfall events during 2003 and 2020 that collected from the rain gauges and monitoring stations system in the basin. For the goal of this study, the different step ahead models will be developed based in the methods, including 1-hour, 2-hour, 3-hour, 4-hour, 5-hour, and 6-hour step ahead streamflow predictions. In addition, the comparison of the hybrid BART model with a baseline model such as super vector regression models is examined in this study. It is expected that the hybrid BART model has a robust performance and can be an optional choice in streamflow forecasting for urban basins.

  • PDF

Analyzing effect and importance of input predictors for urban streamflow prediction based on a Bayesian tree-based model

  • Nguyen, Duc Hai;Bae, Deg-Hyo
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2022년도 학술발표회
    • /
    • pp.134-134
    • /
    • 2022
  • Streamflow forecasting plays a crucial role in water resource control, especially in highly urbanized areas that are very vulnerable to flooding during heavy rainfall event. In addition to providing the accurate prediction, the evaluation of effects and importance of the input predictors can contribute to water manager. Recently, machine learning techniques have applied their advantages for modeling complex and nonlinear hydrological processes. However, the techniques have not considered properly the importance and uncertainty of the predictor variables. To address these concerns, we applied the GA-BART, that integrates a genetic algorithm (GA) with the Bayesian additive regression tree (BART) model for hourly streamflow forecasting and analyzing input predictors. The Jungrang urban basin was selected as a case study and a database was established based on 39 heavy rainfall events during 2003 and 2020 from the rain gauges and monitoring stations. For the goal of this study, we used a combination of inputs that included the areal rainfall of the subbasins at current time step and previous time steps and water level and streamflow of the stations at time step for multistep-ahead streamflow predictions. An analysis of multiple datasets including different input predictors was performed to define the optimal set for streamflow forecasting. In addition, the GA-BART model could reasonably determine the relative importance of the input variables. The assessment might help water resource managers improve the accuracy of forecasts and early flood warnings in the basin.

  • PDF

Optimized machine learning algorithms for predicting the punching shear capacity of RC flat slabs

  • Huajun Yan;Nan Xie;Dandan Shen
    • Advances in concrete construction
    • /
    • 제17권1호
    • /
    • pp.27-36
    • /
    • 2024
  • Reinforced concrete (RC) flat slabs should be designed based on punching shear strength. As part of this study, machine learning (ML) algorithms were developed to accurately predict the punching shear strength of RC flat slabs without shear reinforcement. It is based on Bayesian optimization (BO), combined with four standard algorithms (Support vector regression, Decision trees, Random forests, Extreme gradient boosting) on 446 datasets that contain six design parameters. Furthermore, an analysis of feature importance is carried out by Shapley additive explanation (SHAP), in order to quantify the effect of design parameters on punching shear strength. According to the results, the BO method produces high prediction accuracy by selecting the optimal hyperparameters for each model. With R2 = 0.985, MAE = 0.0155 MN, RMSE = 0.0244 MN, the BO-XGBoost model performed better than the original XGBoost prediction, which had R2 = 0.917, MAE = 0.064 MN, RMSE = 0.121 MN in total dataset. Additionally, recommendations are provided on how to select factors that will influence punching shear resistance of RC flat slabs without shear reinforcement.