• Title/Summary/Keyword: regression tree

Search Result 671, Processing Time 0.026 seconds

Applied linear and nonlinear statistical models for evaluating strength of Geopolymer concrete

  • Prem, Prabhat Ranjan;Thirumalaiselvi, A.;Verma, Mohit
    • Computers and Concrete
    • /
    • v.24 no.1
    • /
    • pp.7-17
    • /
    • 2019
  • The complex phenomenon of the bond formation in geopolymer is not well understood and therefore, difficult to model. This paper present applied statistical models for evaluating the compressive strength of geopolymer. The applied statistical models studied are divided into three different categories - linear regression [least absolute shrinkage and selection operator (LASSO) and elastic net], tree regression [decision and bagging tree] and kernel methods (support vector regression (SVR), kernel ridge regression (KRR), Gaussian process regression (GPR), relevance vector machine (RVM)]. The performance of the methods is compared in terms of error indices, computational effort, convergence and residuals. Based on the present study, kernel based methods (GPR and KRR) are recommended for evaluating compressive strength of Geopolymer concrete.

Comparison of tree-based ensemble models for regression

  • Park, Sangho;Kim, Chanmin
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.5
    • /
    • pp.561-589
    • /
    • 2022
  • When multiple classifications and regression trees are combined, tree-based ensemble models, such as random forest (RF) and Bayesian additive regression trees (BART), are produced. We compare the model structures and performances of various ensemble models for regression settings in this study. RF learns bootstrapped samples and selects a splitting variable from predictors gathered at each node. The BART model is specified as the sum of trees and is calculated using the Bayesian backfitting algorithm. Throughout the extensive simulation studies, the strengths and drawbacks of the two methods in the presence of missing data, high-dimensional data, or highly correlated data are investigated. In the presence of missing data, BART performs well in general, whereas RF provides adequate coverage. The BART outperforms in high dimensional, highly correlated data. However, in all of the scenarios considered, the RF has a shorter computation time. The performance of the two methods is also compared using two real data sets that represent the aforementioned situations, and the same conclusion is reached.

Performance Comparison of Machine-learning Models for Analyzing Weather and Traffic Accident Correlations

  • Li Zi Xuan;Hyunho Yang
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.3
    • /
    • pp.225-232
    • /
    • 2023
  • Owing to advancements in intelligent transportation systems (ITS) and artificial-intelligence technologies, various machine-learning models can be employed to simulate and predict the number of traffic accidents under different weather conditions. Furthermore, we can analyze the relationship between weather and traffic accidents, allowing us to assess whether the current weather conditions are suitable for travel, which can significantly reduce the risk of traffic accidents. In this study, we analyzed 30000 traffic flow data points collected by traffic cameras at nearby intersections in Washington, D.C., USA from October 2012 to May 2017, using Pearson's heat map. We then predicted, analyzed, and compared the performance of the correlation between continuous features by applying several machine-learning algorithms commonly used in ITS, including random forest, decision tree, gradient-boosting regression, and support vector regression. The experimental results indicated that the gradient-boosting regression machine-learning model had the best performance.

A Study on Prediction Techniques through Machine Learning of Real-time Solar Radiation in Jeju (제주 실시간 일사량의 기계학습 예측 기법 연구)

  • Lee, Young-Mi;Bae, Joo-Hyun;Park, Jeong-keun
    • Journal of Environmental Science International
    • /
    • v.26 no.4
    • /
    • pp.521-527
    • /
    • 2017
  • Solar radiation forecasts are important for predicting the amount of ice on road and the potential solar energy. In an attempt to improve solar radiation predictability in Jeju, we conducted machine learning with various data mining techniques such as tree models, conditional inference tree, random forest, support vector machines and logistic regression. To validate machine learning models, the results from the simulation was compared with the solar radiation data observed over Jeju observation site. According to the model assesment, it can be seen that the solar radiation prediction using random forest is the most effective method. The error rate proposed by random forest data mining is 17%.

Selection of Important Variables in the Classification Model for Successful Flight Training (조종사 비행훈련 성패예측모형 구축을 위한 중요변수 선정)

  • Lee, Sang-Heon;Lee, Sun-Doo
    • IE interfaces
    • /
    • v.20 no.1
    • /
    • pp.41-48
    • /
    • 2007
  • The main purpose of this paper is cost reduction in absurd pilot positive expense and human accident prevention which is caused by in the pilot selection process. We use classification models such as logistic regression, decision tree, and neural network based on aptitude test results of 505 ROK Air Force applicants in 2001~2004. First, we determine the reliability and propriety against the aptitude test system which has been improved. Based on this conference flight simulator test item was compared to the new aptitude test item in order to make additional yes or no decision from different models in terms of classification accuracy, ROC and Response Threshold side. Decision tree was selected as the most efficient for each sequential flight training result and the last flight training results predict excellent. Therefore, we propose that the standard of pilot selection be adopted by the decision tree and it presents in the aptitude test item which is new a conference flight simulator test.

Classification and Regression Tree Analysis for Molecular Descriptor Selection and Binding Affinities Prediction of Imidazobenzodiazepines in Quantitative Structure-Activity Relationship Studies

  • Atabati, Morteza;Zarei, Kobra;Abdinasab, Esmaeil
    • Bulletin of the Korean Chemical Society
    • /
    • v.30 no.11
    • /
    • pp.2717-2722
    • /
    • 2009
  • The use of the classification and regression tree (CART) methodology was studied in a quantitative structure-activity relationship (QSAR) context on a data set consisting of the binding affinities of 39 imidazobenzodiazepines for the α1 benzodiazepine receptor. The 3-D structures of these compounds were optimized using HyperChem software with semiempirical AM1 optimization method. After optimization a set of 1481 zero-to three-dimentional descriptors was calculated for each molecule in the data set. The response (dependent variable) in the tree model consisted of the binding affinities of drugs. Three descriptors (two topological and one 3D-Morse descriptors) were applied in the final tree structure to describe the binding affinities. The mean relative error percent for the data set is 3.20%, compared with a previous model with mean relative error percent of 6.63%. To evaluate the predictive power of CART cross validation method was also performed.

Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm (머신러닝 알고리즘 기반의 의료비 예측 모델 개발)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.1 no.1
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.

Sequential prediction of TBM penetration rate using a gradient boosted regression tree during tunneling

  • Lee, Hang-Lo;Song, Ki-Il;Qi, Chongchong;Kim, Kyoung-Yul
    • Geomechanics and Engineering
    • /
    • v.29 no.5
    • /
    • pp.523-533
    • /
    • 2022
  • Several prediction model of penetration rate (PR) of tunnel boring machines (TBMs) have been focused on applying to design stage. In construction stage, however, the expected PR and its trends are changed during tunneling owing to TBM excavation skills and the gap between the investigated and actual geological conditions. Monitoring the PR during tunneling is crucial to rescheduling the excavation plan in real-time. This study proposes a sequential prediction method applicable in the construction stage. Geological and TBM operating data are collected from Gunpo cable tunnel in Korea, and preprocessed through normalization and augmentation. The results show that the sequential prediction for 1 ring unit prediction distance (UPD) is R2≥0.79; whereas, a one-step prediction is R2≤0.30. In modeling algorithm, a gradient boosted regression tree (GBRT) outperformed a least square-based linear regression in sequential prediction method. For practical use, a simple equation between the R2 and UPD is proposed. When UPD increases R2 decreases exponentially; In particular, UPD at R2=0.60 is calculated as 28 rings using the equation. Such a time interval will provide enough time for decision-making. Evidently, the UPD can be adjusted depending on other project and the R2 value targeted by an operator. Therefore, a calculation process for the equation between the R2 and UPD is addressed.

Application of Statistical Models for Default Probability of Loans in Mortgage Companies

  • Jung, Jin-Whan
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.2
    • /
    • pp.605-616
    • /
    • 2000
  • Three primary interests frequently raised by mortgage companies are introduced and the corresponding statistical approaches for the default probability in mortgage companies are examined. Statistical models considered in this paper are time series, logistic regression, decision tree, neural network, and discrete time models. Usage of the models is illustrated using an artificially modified data set and the corresponding models are evaluated in appropriate manners.

  • PDF

Development of Discriminant Analysis System by Graphical User Interface of Visual Basic

  • Lee, Yong-Kyun;Shin, Young-Jae;Cha, Kyung-Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.2
    • /
    • pp.447-456
    • /
    • 2007
  • Recently, the multivariate statistical analysis has been used to analyze meaningful information for various data. In this paper, we develope the multivariate statistical analysis system combined with Fisher discriminant analysis, logistic regression, neural network, and decision tree using visual basic 6.0.

  • PDF