• Title/Summary/Keyword: gradient boosting

Search Result 225, Processing Time 0.022 seconds

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • v.37 no.6
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

Who Gets Government SME R&D Subsidy? Application of Gradient Boosting Model (Gradient Boosting 모형을 이용한 중소기업 R&D 지원금 결정요인 분석)

  • Kang, Sung Won;Kang, HeeChan
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.4
    • /
    • pp.77-109
    • /
    • 2020
  • In this paper, we build a gradient Boosting model to predict government SME R&D subsidy, select features of high importance, and measure the impact of each features to the predicted subsidy using PDP and SHAP value. Unlike previous empirical researches, we focus on the effect of the R&D subsidy distribution pattern to the incentive of the firms participating subsidy competition. We used the firm data constructed by KISTEP linking government R&D subsidy record with financial statements provided by NICE, and applied a Gradient Boosting model to predict R&D subsidy. We found that firms with higher R&D performance and larger R&D investment tend to have higher R&D subsidies, but firms with higher operation profit or total asset turnover rate tend to have lower R&D subsidies. Our results suggest that current government R&D subsidy distribution pattern provides incentive to improve R&D project performance, but not business performance.

Prediction of the Movement Directions of Index and Stock Prices Using Extreme Gradient Boosting (익스트림 그라디언트 부스팅을 이용한 지수/주가 이동 방향 예측)

  • Kim, HyoungDo
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.9
    • /
    • pp.623-632
    • /
    • 2018
  • Both investors and researchers are attentive to the prediction of stock price movement directions since the accurate prediction plays an important role in strategic decision making on stock trading. According to previous studies, taken together, one can see that different factors are considered depending on stock markets and prediction periods. This paper aims to analyze what data mining techniques show better performance with some representative index and stock price datasets in the Korea stock market. In particular, extreme gradient boosting technique, proving itself to be the fore-runner through recent open competitions, is applied to the prediction problem. Its performance has been analyzed in comparison with other data mining techniques reported good in the prediction of stock price movement directions such as random forests, support vector machines, and artificial neural networks. Through experiments with the index/price datasets of 12 years, it is identified that the gradient boosting technique is the best in predicting the movement directions after 1 to 4 days with a few partial equivalence to the other techniques.

Comparison of machine learning algorithms for Chl-a prediction in the middle of Nakdong River (focusing on water quality and quantity factors) (머신러닝 기법을 활용한 낙동강 중류 지역의 Chl-a 예측 알고리즘 비교 연구(수질인자 및 수량 중심으로))

  • Lee, Sang-Min;Park, Kyeong-Deok;Kim, Il-Kyu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.34 no.4
    • /
    • pp.277-288
    • /
    • 2020
  • In this study, we performed algorithms to predict algae of Chlorophyll-a (Chl-a). Water quality and quantity data of the middle Nakdong River area were used. At first, the correlation analysis between Chl-a and water quality and quantity data was studied. We extracted ten factors of high importance for water quality and quantity data about the two weirs. Algorithms predicted how ten factors affected Chl-a occurrence. We performed algorithms about decision tree, random forest, elastic net, gradient boosting with Python. The root mean square error (RMSE) value was used to evaluate excellent algorithms. The gradient boosting showed 10.55 of RMSE value for the Gangjeonggoryeong (GG) site and 11.43 of RMSE value for the Dalsung (DS) site. The gradient boosting algorithm showed excellent results for GG and DS sites. Prediction value for the four algorithms was also evaluated through the Receiver operating characteristic (ROC) curve and Area under curve (AUC). As a result of the evaluation, the AUC value was 0.877 at GG site and the AUC value was 0.951 at DS site. So the algorithm's ability to interpret seemed to be excellent.

The study of foreign exchange trading revenue model using decision tree and gradient boosting (외환거래에서 의사결정나무와 그래디언트 부스팅을 이용한 수익 모형 연구)

  • Jung, Ji Hyeon;Min, Dae Kee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.161-170
    • /
    • 2013
  • The FX (Foreign Exchange) is a form of exchange for the global decentralized trading of international currencies. The simple sense of Forex is simultaneous purchase and sale of the currency or the exchange of one country's currency for other countries'. We can find the consistent rules of trading by comparing the gradient boosting method and the decision trees methods. Methods such as time series analysis used for the prediction of financial markets have advantage of the long-term forecasting model. On the other hand, it is difficult to reflect the rapidly changing price fluctuations in the short term. Therefore, in this study, gradient boosting method and decision tree method are applied to analyze the short-term data in order to make the rules for the revenue structure of the FX market and evaluated the stability and the prediction of the model.

Study on Fault Detection of a Gas Pressure Regulator Based on Machine Learning Algorithms

  • Seo, Chan-Yang;Suh, Young-Joo;Kim, Dong-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.4
    • /
    • pp.19-27
    • /
    • 2020
  • In this paper, we propose a machine learning method for diagnosing the failure of a gas pressure regulator. Originally, when implementing a machine learning model for detecting abnormal operation of a facility, it is common to install sensors to collect data. However, failure of a gas pressure regulator can lead to fatal safety problems, so that installing an additional sensor on a gas pressure regulator is not simple. In this paper, we propose various machine learning approach for diagnosing the abnormal operation of a gas pressure regulator with only the flow rate and gas pressure data collected from a gas pressure regulator itself. Since the fault data of a gas pressure regulator is not enough, the model is trained in all classes by applying the over-sampling method. The classification model was implemented using Gradient boosting, 1D Convolutional Neural Networks, and LSTM algorithm, and gradient boosting model showed the best performance among classification models with 99.975% accuracy.

Darknet Traffic Detection and Classification Using Gradient Boosting Techniques (Gradient Boosting 기법을 활용한 다크넷 트래픽 탐지 및 분류)

  • Kim, Jihye;Lee, Soo Jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.2
    • /
    • pp.371-379
    • /
    • 2022
  • Darknet is based on the characteristics of anonymity and security, and this leads darknet to be continuously abused for various crimes and illegal activities. Therefore, it is very important to detect and classify darknet traffic to prevent the misuse and abuse of darknet. This work proposes a novel approach, which uses the Gradient Boosting techniques for darknet traffic detection and classification. XGBoost and LightGBM algorithm achieve detection accuracy of 99.99%, and classification accuracy of over 99%, which could get more than 3% higher detection accuracy and over 13% higher classification accuracy, compared to the previous research. In particular, LightGBM algorithm could detect and classify darknet traffic in a way that is superior to XGBoost by reducing the learning time by about 1.6 times and hyperparameter tuning time by more than 10 times.

Automated Verification of Livestock Manure Transfer Management System Handover Document using Gradient Boosting (Gradient Boosting을 이용한 가축분뇨 인계관리시스템 인계서 자동 검증)

  • Jonghwi Hwang;Hwakyung Kim;Jaehak Ryu;Taeho Kim;Yongtae Shin
    • Journal of Information Technology Services
    • /
    • v.22 no.4
    • /
    • pp.97-110
    • /
    • 2023
  • In this study, we propose a technique to automatically generate transfer documents using sensor data from livestock manure transfer systems. The research involves analyzing sensor data and applying machine learning techniques to derive optimized outcomes for livestock manure transfer documents. By comparing and contrasting with existing documents, we present a method for automatic document generation. Specifically, we propose the utilization of Gradient Boosting, a machine learning algorithm. The objective of this research is to enhance the efficiency of livestock manure and liquid byproduct management. Currently, stakeholders including producers, transporters, and processors manually input data into the livestock manure transfer management system during the disposal of manure and liquid byproducts. This manual process consumes additional labor, leads to data inconsistency, and complicates the management of distribution and treatment. Therefore, the aim of this study is to leverage data to automatically generate transfer documents, thereby increasing the efficiency of livestock manure and liquid byproduct management. By utilizing sensor data from livestock manure and liquid byproduct transport vehicles and employing machine learning algorithms, we establish a system that automates the validation of transfer documents, reducing the burden on producers, transporters, and processors. This efficient management system is anticipated to create a transparent environment for the distribution and treatment of livestock manure and liquid byproducts.

Performance Comparison of Neural Network and Gradient Boosting Machine for Dropout Prediction of University Students

  • Hyeon Gyu Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.8
    • /
    • pp.49-58
    • /
    • 2023
  • Dropouts of students not only cause financial loss to the university, but also have negative impacts on individual students and society together. To resolve this issue, various studies have been conducted to predict student dropout using machine learning. This paper presents a model implemented using DNN (Deep Neural Network) and LGBM (Light Gradient Boosting Machine) to predict dropout of university students and compares their performance. The academic record and grade data collected from 20,050 students at A University, a small and medium-sized 4-year university in Seoul, were used for learning. Among the 140 attributes of the collected data, only the attributes with a correlation coefficient of 0.1 or higher with the attribute indicating dropout were extracted and used for learning. As learning algorithms, DNN (Deep Neural Network) and LightGBM (Light Gradient Boosting Machine) were used. Our experimental results showed that the F1-scores of DNN and LGBM were 0.798 and 0.826, respectively, indicating that LGBM provided 2.5% better prediction performance than DNN.

Hybrid machine learning with moth-flame optimization methods for strength prediction of CFDST columns under compression

  • Quang-Viet Vu;Dai-Nhan Le;Thai-Hoan Pham;Wei Gao;Sawekchai Tangaramvong
    • Steel and Composite Structures
    • /
    • v.51 no.6
    • /
    • pp.679-695
    • /
    • 2024
  • This paper presents a novel technique that combines machine learning (ML) with moth-flame optimization (MFO) methods to predict the axial compressive strength (ACS) of concrete filled double skin steel tubes (CFDST) columns. The proposed model is trained and tested with a dataset containing 125 tests of the CFDST column subjected to compressive loading. Five ML models, including extreme gradient boosting (XGBoost), gradient tree boosting (GBT), categorical gradient boosting (CAT), support vector machines (SVM), and decision tree (DT) algorithms, are utilized in this work. The MFO algorithm is applied to find optimal hyperparameters of these ML models and to determine the most effective model in predicting the ACS of CFDST columns. Predictive results given by some performance metrics reveal that the MFO-CAT model provides superior accuracy compared to other considered models. The accuracy of the MFO-CAT model is validated by comparing its predictive results with existing design codes and formulae. Moreover, the significance and contribution of each feature in the dataset are examined by employing the SHapley Additive exPlanations (SHAP) method. A comprehensive uncertainty quantification on probabilistic characteristics of the ACS of CFDST columns is conducted for the first time to examine the models' responses to variations of input variables in the stochastic environments. Finally, a web-based application is developed to predict ACS of the CFDST column, enabling rapid practical utilization without requesting any programing or machine learning expertise.