• Title/Summary/Keyword: Boosted Tree

Search Result 16, Processing Time 0.021 seconds

Ensemble of Fuzzy Decision Tree for Efficient Indoor Space Recognition

  • Kim, Kisang;Choi, Hyung-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.4
    • /
    • pp.33-39
    • /
    • 2017
  • In this paper, we expand the process of classification to an ensemble of fuzzy decision tree. For indoor space recognition, many research use Boosted Tree, consists of Adaboost and decision tree. The Boosted Tree extracts an optimal decision tree in stages. On each stage, Boosted Tree extracts the good decision tree by minimizing the weighted error of classification. This decision tree performs a hard decision. In most case, hard decision offer some error when they classify nearby a dividing point. Therefore, We suggest an ensemble of fuzzy decision tree, which offer some flexibility to the Boosted Tree algorithm as well as a high performance. In experimental results, we evaluate that the accuracy of suggested methods improved about 13% than the traditional one.

Prediction of the number of public bicycle rental in Seoul using Boosted Decision Tree Regression Algorithm

  • KIM, Hyun-Jun;KIM, Hyun-Ki
    • Korean Journal of Artificial Intelligence
    • /
    • v.10 no.1
    • /
    • pp.9-14
    • /
    • 2022
  • The demand for public bicycles operated by the Seoul Metropolitan Government is increasing every year. The size of the Seoul public bicycle project, which first started with about 5,600 units, increased to 3,7500 units as of September 2021, and the number of members is also increasing every year. However, as the size of the project grows, excessive budget spending and deficit problems are emerging for public bicycle projects, and new bicycles, rental office costs, and bicycle maintenance costs are blamed for the deficit. In this paper, the Azure Machine Learning Studio program and the Boosted Decision Tree Regression technique are used to predict the number of public bicycle rental over environmental factors and time. Predicted results it was confirmed that the demand for public bicycles was high in the season except for winter, and the demand for public bicycles was the highest at 6 p.m. In addition, in this paper compare four additional regression algorithms in addition to the Boosted Decision Tree Regression algorithm to measure algorithm performance. The results showed high accuracy in the order of the First Boosted Decision Tree Regression Algorithm (0.878802), second Decision Forest Regression (0.838232), third Poison Regression (0.62699), and fourth Linear Regression (0.618773). Based on these predictions, it is expected that more public bicycles will be placed at rental stations near public transportation to meet the growing demand for commuting hours and that more bicycles will be placed in rental stations in summer than winter and the life of bicycles can be extended in winter.

A customer credit Prediction Researched to Improve Credit Stability based on Artificial Intelligence

  • MUN, Ji-Hui;JUNG, Sang Woo
    • Korean Journal of Artificial Intelligence
    • /
    • v.9 no.1
    • /
    • pp.21-27
    • /
    • 2021
  • In this Paper, Since the 1990s, Korea's credit card industry has steadily developed. As a result, various problems have arisen, such as careless customer information management and loans to low-credit customers. This, in turn, had a high delinquency rate across the card industry and a negative impact on the economy. Therefore, in this paper, based on Azure, we analyze and predict the delinquency and delinquency periods of credit loans according to gender, own car, property, number of children, education level, marital status, and employment status through linear regression analysis and enhanced decision tree algorithm. These predictions can consequently reduce the likelihood of reckless credit lending and issuance of credit cards, reducing the number of bad creditors and reducing the risk of banks. In addition, after classifying and dividing the customer base based on the predicted result, it can be used as a basis for reducing the risk of credit loans by developing a credit product suitable for each customer. The predicted result through Azure showed that when predicting with Linear Regression and Boosted Decision Tree algorithm, the Boosted Decision Tree algorithm made more accurate prediction. In addition, we intend to increase the accuracy of the analysis by assigning a number to each data in the future and predicting again.

A Study on a car Insurance purchase Prediction Using Two-Class Logistic Regression and Two-Class Boosted Decision Tree

  • AN, Su Hyun;YEO, Seong Hee;KANG, Minsoo
    • Korean Journal of Artificial Intelligence
    • /
    • v.9 no.1
    • /
    • pp.9-14
    • /
    • 2021
  • This paper predicted a model that indicates whether to buy a car based on primary health insurance customer data. Currently, automobiles are being used to land transportation and living, and the scope of use and equipment is expanding. This rapid increase in automobiles has caused automobile insurance to emerge as an essential business target for insurance companies. Therefore, if the car insurance sales are predicted and sold using the information of existing health insurance customers, it can generate continuous profits in the insurance company's operating performance. Therefore, this paper aims to analyze existing customer characteristics and implement a predictive model to activate advertisements for customers interested in such auto insurance. The goal of this study is to maximize the profits of insurance companies by devising communication strategies that can optimize business models and profits for customers. This study was conducted through the Microsoft Azure program, and an automobile insurance purchase prediction model was implemented using Health Insurance Cross-sell Prediction data. The program algorithm uses Two-Class Logistic Regression and Two-Class Boosted Decision Tree at the same time to compare two models and predict and compare the results. According to the results of this study, when the Threshold is 0.3, the AUC is 0.837, and the accuracy is 0.833, which has high accuracy. Therefore, the result was that customers with health insurance could induce a positive reaction to auto insurance purchases.

Sequential prediction of TBM penetration rate using a gradient boosted regression tree during tunneling

  • Lee, Hang-Lo;Song, Ki-Il;Qi, Chongchong;Kim, Kyoung-Yul
    • Geomechanics and Engineering
    • /
    • v.29 no.5
    • /
    • pp.523-533
    • /
    • 2022
  • Several prediction model of penetration rate (PR) of tunnel boring machines (TBMs) have been focused on applying to design stage. In construction stage, however, the expected PR and its trends are changed during tunneling owing to TBM excavation skills and the gap between the investigated and actual geological conditions. Monitoring the PR during tunneling is crucial to rescheduling the excavation plan in real-time. This study proposes a sequential prediction method applicable in the construction stage. Geological and TBM operating data are collected from Gunpo cable tunnel in Korea, and preprocessed through normalization and augmentation. The results show that the sequential prediction for 1 ring unit prediction distance (UPD) is R2≥0.79; whereas, a one-step prediction is R2≤0.30. In modeling algorithm, a gradient boosted regression tree (GBRT) outperformed a least square-based linear regression in sequential prediction method. For practical use, a simple equation between the R2 and UPD is proposed. When UPD increases R2 decreases exponentially; In particular, UPD at R2=0.60 is calculated as 28 rings using the equation. Such a time interval will provide enough time for decision-making. Evidently, the UPD can be adjusted depending on other project and the R2 value targeted by an operator. Therefore, a calculation process for the equation between the R2 and UPD is addressed.

Prediction of Germination of Korean Red Pine (Pinus densiflora) Seed using FT NIR Spectroscopy and Binary Classification Machine Learning Methods (FT NIR 분광법 및 이진분류 머신러닝 방법을 이용한 소나무 종자 발아 예측)

  • Yong-Yul Kim;Ja-Jung Ku;Da-Eun Gu;Sim-Hee Han;Kyu-Suk Kang
    • Journal of Korean Society of Forest Science
    • /
    • v.112 no.2
    • /
    • pp.145-156
    • /
    • 2023
  • In this study, Fourier-transform near-infrared (FT-NIR) spectra of Korean red pine seeds stored at -18℃ and 4℃ for 18 years were analyzed. To develop seed-germination prediction models, the performance of seven machine learning methods, namely XGBoost, Boosted Tree, Bootstrap Forest, Neural Networks, Decision Tree, Support Vector Machine, PLS-DA, were compared. The predictive performance, assessed by accuracy, misclassification, and area under the curve (0.9722, 0.0278, and 0.9735 for XGBoost, and 0.9653, 0.0347, and 0.9647 for Boosted Tree), was better for the XGBoost and decision tree models when compared with other models. The 54 wave-number variables of the two models were of high relative importance in seed-germination prediction and were grouped into six spectral ranges (811~1,088 nm, 1,137~1,273 nm, 1,336~1,453 nm, 1,666~1,671 nm, 1,879~2,045 nm, and 2,058~2,409 nm) for aromatic amino acids, cellulose, lignin, starch, fatty acids, and moisture, respectively. Use of the NIR spectral data and two machine learning models developed in this study gave >96% accuracy for the prediction of pine-seed germination after long-term storage, indicating this approach could be useful for non-destructive viability testing of stored seed genetic resources.

A review of tree-based Bayesian methods

  • Linero, Antonio R.
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.6
    • /
    • pp.543-559
    • /
    • 2017
  • Tree-based regression and classification ensembles form a standard part of the data-science toolkit. Many commonly used methods take an algorithmic view, proposing greedy methods for constructing decision trees; examples include the classification and regression trees algorithm, boosted decision trees, and random forests. Recent history has seen a surge of interest in Bayesian techniques for constructing decision tree ensembles, with these methods frequently outperforming their algorithmic counterparts. The goal of this article is to survey the landscape surrounding Bayesian decision tree methods, and to discuss recent modeling and computational developments. We provide connections between Bayesian tree-based methods and existing machine learning techniques, and outline several recent theoretical developments establishing frequentist consistency and rates of convergence for the posterior distribution. The methodology we present is applicable for a wide variety of statistical tasks including regression, classification, modeling of count data, and many others. We illustrate the methodology on both simulated and real datasets.

A Comparative Study of Phishing Websites Classification Based on Classifier Ensemble

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.5
    • /
    • pp.617-625
    • /
    • 2018
  • Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

A Comparative Study of Phishing Websites Classification Based on Classifier Ensembles

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Multimedia Information System
    • /
    • v.5 no.2
    • /
    • pp.99-104
    • /
    • 2018
  • Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

Predictive Analysis of Fire Risk Factors in Gyeonggi-do Using Machine Learning (머신러닝을 이용한 경기도 화재위험요인 예측분석)

  • Seo, Min Song;Castillo Osorio, Ever Enrique;Yoo, Hwan Hee
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.39 no.6
    • /
    • pp.351-361
    • /
    • 2021
  • The seriousness of fire is rising because fire causes enormous damage to property and human life. Therefore, this study aims to predict various risk factors affecting fire by fire type. The predictive analysis of fire factors was carried out targeting Gyeonggi-do, which has the highest number of fires in the country. For the analysis, using machine learning methods SVM (Support Vector Machine), RF (Random Forest), GBRT (Gradient Boosted Regression Tree) the accuracy of each model was presented with a high fit model through MAE (Mean Absolute Error) and RMSE (Root Mean Squared Error), and based on this, predictive analysis of fire factors in Gyeonggi-do was conducted. In addition, using machine learning methods such as SVM (Support Vector Machine), RF (Random Forest), and GBRT (Gradient Boosted Regression Tree), the accuracy of each model was presented with a high-fit model through MAE and RMSE. Predictive analysis of occurrence factors was achieved. Based on this, as a result of comparative analysis of three machine learning methods, the RF method showed a MAE = 1.765 and RMSE = 1.876, as well as the MAE and RMSE verification and test data were very similar with a difference between MAE = 0.046 and RMSE = 0.04 showing the best predictive results. The results of this study are expected to be used as useful data for fire safety management allowing decision makers to identify the sequence of dangers related to the factors affecting the occurrence of fire.