• Title/Summary/Keyword: Light gradient boosting machine

Search Result 40, Processing Time 0.027 seconds

A sensitivity analysis of machine learning models on fire-induced spalling of concrete: Revealing the impact of data manipulation on accuracy and explainability

  • Mohammad K. al-Bashiti;M.Z. Naser
    • Computers and Concrete
    • /
    • v.33 no.4
    • /
    • pp.409-423
    • /
    • 2024
  • Using an extensive database, a sensitivity analysis across fifteen machine learning (ML) classifiers was conducted to evaluate the impact of various data manipulation techniques, evaluation metrics, and explainability tools. The results of this sensitivity analysis reveal that the examined models can achieve an accuracy ranging from 72-93% in predicting the fire-induced spalling of concrete and denote the light gradient boosting machine, extreme gradient boosting, and random forest algorithms as the best-performing models. Among such models, the six key factors influencing spalling were maximum exposure temperature, heating rate, compressive strength of concrete, moisture content, silica fume content, and the quantity of polypropylene fiber. Our analysis also documents some conflicting results observed with the deep learning model. As such, this study highlights the necessity of selecting suitable models and carefully evaluating the presence of possible outcome biases.

Potential of multispectral imaging for maturity classification and recognition of oriental melon

  • Seongmin Lee;Kyoung-Chul Kim;Kangjin Lee;Jinhwan Ryu;Youngki Hong;Byeong-Hyo Cho
    • Korean Journal of Agricultural Science
    • /
    • v.50 no.3
    • /
    • pp.485-496
    • /
    • 2023
  • In this study, we aimed to apply multispectral imaging (713 - 920 nm, 10 bands) for maturity classification and recognition of oriental melons grown in hydroponic greenhouses. A total of 20 oriental melons were selected, and time series multispectral imaging of oriental melons was 7 - 9 times for each sample from April 21, 2023, to May 12, 2023. We used several approaches, such as Savitzky-Golay (SG), standard normal variate (SNV), and Combination of SG and SNV (SG + SNV), for pre-processing the multispectral data. As a result, 713 - 759 nm bands were preprocessed with SG for the maturity classification of oriental melons. Additionally, a Light Gradient Boosting Machine (LightGBM) was used to train the recognition model for oriental melon. R2 of recognition model were 0.92, 0.91 for the training and validation sets, respectively, and the F-scores were 96.6 and 79.4% for the training and testing sets, respectively. Therefore, multispectral imaging in the range of 713 - 920 nm can be used to classify oriental melons maturity and recognize their fruits.

Using Mechanical Learning Analysis of Determinants of Housing Sales and Establishment of Forecasting Model (기계학습을 활용한 주택매도 결정요인 분석 및 예측모델 구축)

  • Kim, Eun-mi;Kim, Sang-Bong;Cho, Eun-seo
    • Journal of Cadastre & Land InformatiX
    • /
    • v.50 no.1
    • /
    • pp.181-200
    • /
    • 2020
  • This study used the OLS model to estimate the determinants affecting the tenure of a home and then compared the predictive power of each model with SVM, Decision Tree, Random Forest, Gradient Boosting, XGBooest and LightGBM. There is a difference from the preceding study in that the Stacking model, one of the ensemble models, can be used as a base model to establish a more predictable model to identify the volume of housing transactions in the housing market. OLS analysis showed that sales profits, housing prices, the number of household members, and the type of residential housing (detached housing, apartments) affected the period of housing ownership, and compared the predictability of the machine learning model with RMSE, the results showed that the machine learning model had higher predictability. Afterwards, the predictive power was compared by applying each machine learning after rebuilding the data with the influencing variables, and the analysis showed the best predictive power of Random Forest. In addition, the most predictable Random Forest, Decision Tree, Gradient Boosting, and XGBooost models were applied as individual models, and the Stacking model was constructed using Linear, Ridge, and Lasso models as meta models. As a result of the analysis, the RMSE value in the Ridge model was the lowest at 0.5181, thus building the highest predictive model.

A Comparative Analysis of Ensemble Learning-Based Classification Models for Explainable Term Deposit Subscription Forecasting (설명 가능한 정기예금 가입 여부 예측을 위한 앙상블 학습 기반 분류 모델들의 비교 분석)

  • Shin, Zian;Moon, Jihoon;Rho, Seungmin
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.3
    • /
    • pp.97-117
    • /
    • 2021
  • Predicting term deposit subscriptions is one of representative financial marketing in banks, and banks can build a prediction model using various customer information. In order to improve the classification accuracy for term deposit subscriptions, many studies have been conducted based on machine learning techniques. However, even if these models can achieve satisfactory performance, utilizing them is not an easy task in the industry when their decision-making process is not adequately explained. To address this issue, this paper proposes an explainable scheme for term deposit subscription forecasting. For this, we first construct several classification models using decision tree-based ensemble learning methods, which yield excellent performance in tabular data, such as random forest, gradient boosting machine (GBM), extreme gradient boosting (XGB), and light gradient boosting machine (LightGBM). We then analyze their classification performance in depth through 10-fold cross-validation. After that, we provide the rationale for interpreting the influence of customer information and the decision-making process by applying Shapley additive explanation (SHAP), an explainable artificial intelligence technique, to the best classification model. To verify the practicality and validity of our scheme, experiments were conducted with the bank marketing dataset provided by Kaggle; we applied the SHAP to the GBM and LightGBM models, respectively, according to different dataset configurations and then performed their analysis and visualization for explainable term deposit subscriptions.

Performance Analysis of Trading Strategy using Gradient Boosting Machine Learning and Genetic Algorithm

  • Jang, Phil-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.147-155
    • /
    • 2022
  • In this study, we developed a system to dynamically balance a daily stock portfolio and performed trading simulations using gradient boosting and genetic algorithms. We collected various stock market data from stocks listed on the KOSPI and KOSDAQ markets, including investor-specific transaction data. Subsequently, we indexed the data as a preprocessing step, and used feature engineering to modify and generate variables for training. First, we experimentally compared the performance of three popular gradient boosting algorithms in terms of accuracy, precision, recall, and F1-score, including XGBoost, LightGBM, and CatBoost. Based on the results, in a second experiment, we used a LightGBM model trained on the collected data along with genetic algorithms to predict and select stocks with a high daily probability of profit. We also conducted simulations of trading during the period of the testing data to analyze the performance of the proposed approach compared with the KOSPI and KOSDAQ indices in terms of the CAGR (Compound Annual Growth Rate), MDD (Maximum Draw Down), Sharpe ratio, and volatility. The results showed that the proposed strategies outperformed those employed by the Korean stock market in terms of all performance metrics. Moreover, our proposed LightGBM model with a genetic algorithm exhibited competitive performance in predicting stock price movements.

Form-finding of lifting self-forming GFRP elastic gridshells based on machine learning interpretability methods

  • Soheila, Kookalani;Sandy, Nyunn;Sheng, Xiang
    • Structural Engineering and Mechanics
    • /
    • v.84 no.5
    • /
    • pp.605-618
    • /
    • 2022
  • Glass fiber reinforced polymer (GFRP) elastic gridshells consist of long continuous GFRP tubes that form elastic deformations. In this paper, a method for the form-finding of gridshell structures is presented based on the interpretable machine learning (ML) approaches. A comparative study is conducted on several ML algorithms, including support vector regression (SVR), K-nearest neighbors (KNN), decision tree (DT), random forest (RF), AdaBoost, XGBoost, category boosting (CatBoost), and light gradient boosting machine (LightGBM). A numerical example is presented using a standard double-hump gridshell considering two characteristics of deformation as objective functions. The combination of the grid search approach and k-fold cross-validation (CV) is implemented for fine-tuning the parameters of ML models. The results of the comparative study indicate that the LightGBM model presents the highest prediction accuracy. Finally, interpretable ML approaches, including Shapely additive explanations (SHAP), partial dependence plot (PDP), and accumulated local effects (ALE), are applied to explain the predictions of the ML model since it is essential to understand the effect of various values of input parameters on objective functions. As a result of interpretability approaches, an optimum gridshell structure is obtained and new opportunities are verified for form-finding investigation of GFRP elastic gridshells during lifting construction.

Developing a regional fog prediction model using tree-based machine-learning techniques and automated visibility observations (시정계 자료와 기계학습 기법을 이용한 지역 안개예측 모형 개발)

  • Kim, Daeha
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.12
    • /
    • pp.1255-1263
    • /
    • 2021
  • While it could become an alternative water resource, fog could undermine traffic safety and operational performance of infrastructures. To reduce such adverse impacts, it is necessary to have spatially continuous fog risk information. In this work, tree-based machine-learning models were developed in order to quantify fog risks with routine meteorological observations alone. The Extreme Gradient Boosting (XGB), Light Gradient Boosting (LGB), and Random Forests (RF) were chosen for the regional fog models using operational weather and visibility observations within the Jeollabuk-do province. Results showed that RF seemed to show the most robust performance to categorize between fog and non-fog situations during the training and evaluation period of 2017-2019. While the LGB performed better than in predicting fog occurrences than the others, its false alarm ratio was the highest (0.695) among the three models. The predictability of the three models considerably declined when applying them for an independent period of 2020, potentially due to the distinctively enhanced air quality in the year under the global lockdown. Nonetheless, even in 2020, the three models were all able to produce fog risk information consistent with the spatial variation of observed fog occurrences. This work suggests that the tree-based machine learning models could be used as tools to find locations with relatively high fog risks.

Evaluating the Efficiency of Models for Predicting Seismic Building Damage (지진으로 인한 건물 손상 예측 모델의 효율성 분석)

  • Chae Song Hwa;Yujin Lim
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.5
    • /
    • pp.217-220
    • /
    • 2024
  • Predicting earthquake occurrences accurately is challenging, and preparing all buildings with seismic design for such random events is a difficult task. Analyzing building features to predict potential damage and reinforcing vulnerabilities based on this analysis can minimize damages even in buildings without seismic design. Therefore, research analyzing the efficiency of building damage prediction models is essential. In this paper, we compare the accuracy of earthquake damage prediction models using machine learning classification algorithms, including Random Forest, Extreme Gradient Boosting, LightGBM, and CatBoost, utilizing data from buildings damaged during the 2015 Nepal earthquake.

Store Sales Prediction Using Gradient Boosting Model (그래디언트 부스팅 모델을 활용한 상점 매출 예측)

  • Choi, Jaeyoung;Yang, Heeyoon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.2
    • /
    • pp.171-177
    • /
    • 2021
  • Through the rapid developments in machine learning, there have been diverse utilization approaches not only in industrial fields but also in daily life. Implementations of machine learning on financial data, also have been of interest. Herein, we employ machine learning algorithms to store sales data and present future applications for fintech enterprises. We utilize diverse missing data processing methods to handle missing data and apply gradient boosting machine learning algorithms; XGBoost, LightGBM, CatBoost to predict the future revenue of individual stores. As a result, we found that using median imputation onto missing data with the appliance of the xgboost algorithm has the best accuracy. By employing the proposed method, fintech enterprises and customers can attain benefits. Stores can benefit by receiving financial assistance beforehand from fintech companies, while these corporations can benefit by offering financial support to these stores with low risk.

Improved prediction of soil liquefaction susceptibility using ensemble learning algorithms

  • Satyam Tiwari;Sarat K. Das;Madhumita Mohanty;Prakhar
    • Geomechanics and Engineering
    • /
    • v.37 no.5
    • /
    • pp.475-498
    • /
    • 2024
  • The prediction of the susceptibility of soil to liquefaction using a limited set of parameters, particularly when dealing with highly unbalanced databases is a challenging problem. The current study focuses on different ensemble learning classification algorithms using highly unbalanced databases of results from in-situ tests; standard penetration test (SPT), shear wave velocity (Vs) test, and cone penetration test (CPT). The input parameters for these datasets consist of earthquake intensity parameters, strong ground motion parameters, and in-situ soil testing parameters. liquefaction index serving as the binary output parameter. After a rigorous comparison with existing literature, extreme gradient boosting (XGBoost), bagging, and random forest (RF) emerge as the most efficient models for liquefaction instance classification across different datasets. Notably, for SPT and Vs-based models, XGBoost exhibits superior performance, followed by Light gradient boosting machine (LightGBM) and Bagging, while for CPT-based models, Bagging ranks highest, followed by Gradient boosting and random forest, with CPT-based models demonstrating lower Gmean(error), rendering them preferable for soil liquefaction susceptibility prediction. Key parameters influencing model performance include internal friction angle of soil (ϕ) and percentage of fines less than 75 µ (F75) for SPT and Vs data and normalized average cone tip resistance (qc) and peak horizontal ground acceleration (amax) for CPT data. It was also observed that the addition of Vs measurement to SPT data increased the efficiency of the prediction in comparison to only SPT data. Furthermore, to enhance usability, a graphical user interface (GUI) for seamless classification operations based on provided input parameters was proposed.