• Title/Summary/Keyword: LGBM

Search Result 29, Processing Time 0.024 seconds

Predicting Deformation Behavior of Additively Manufactured Ti-6Al-4V Based on XGB and LGBM (XGB 및 LGBM을 활용한 Ti-6Al-4V 적층재의 변형 거동 예측)

  • Cheon, S.;Yu, J.;Kim, J.G.;Oh, J.S.;Nam, T.H.;Lee, T.
    • Transactions of Materials Processing
    • /
    • v.31 no.4
    • /
    • pp.173-178
    • /
    • 2022
  • The present study employed two different machine-learning approaches, the extreme gradient boosting (XGB) and light gradient boosting machine (LGBM), to predict a compressive deformation behavior of additively manufactured Ti-6Al-4V. Such approaches have rarely been verified in the field of metallurgy in contrast to artificial neural network and its variants. XGB and LGBM provided a good prediction for elongation to failure under an extrapolated condition of processing parameters. The predicting accuracy of these methods was better than that of response surface method. Furthermore, XGB and LGBM with optimum hyperparameters well predicted a deformation behavior of Ti-6Al-4V additively manufactured under the extrapolated condition. Although the predicting capability of two methods was comparable, LGBM was superior to XGB in light of six-fold higher rate of machine learning. It is also noted this work has verified the LGBM approach in solving the metallurgical problem for the first time.

Performance Comparison of Neural Network and Gradient Boosting Machine for Dropout Prediction of University Students

  • Hyeon Gyu Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.8
    • /
    • pp.49-58
    • /
    • 2023
  • Dropouts of students not only cause financial loss to the university, but also have negative impacts on individual students and society together. To resolve this issue, various studies have been conducted to predict student dropout using machine learning. This paper presents a model implemented using DNN (Deep Neural Network) and LGBM (Light Gradient Boosting Machine) to predict dropout of university students and compares their performance. The academic record and grade data collected from 20,050 students at A University, a small and medium-sized 4-year university in Seoul, were used for learning. Among the 140 attributes of the collected data, only the attributes with a correlation coefficient of 0.1 or higher with the attribute indicating dropout were extracted and used for learning. As learning algorithms, DNN (Deep Neural Network) and LightGBM (Light Gradient Boosting Machine) were used. Our experimental results showed that the F1-scores of DNN and LGBM were 0.798 and 0.826, respectively, indicating that LGBM provided 2.5% better prediction performance than DNN.

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • v.37 no.6
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

Comparative Study of Data Preprocessing and ML&DL Model Combination for Daily Dam Inflow Prediction (댐 일유입량 예측을 위한 데이터 전처리와 머신러닝&딥러닝 모델 조합의 비교연구)

  • Youngsik Jo;Kwansue Jung
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.358-358
    • /
    • 2023
  • 본 연구에서는 그동안 수자원분야 강우유출 해석분야에 활용되었던 대표적인 머신러닝&딥러닝(ML&DL) 모델을 활용하여 모델의 하이퍼파라미터 튜닝뿐만 아니라 모델의 특성을 고려한 기상 및 수문데이터의 조합과 전처리(lag-time, 이동평균 등)를 통하여 데이터 특성과 ML&DL모델의 조합시나리오에 따른 일 유입량 예측성능을 비교 검토하는 연구를 수행하였다. 이를 위해 소양강댐 유역을 대상으로 1974년에서 2021년까지 축적된 기상 및 수문데이터를 활용하여 1) 강우, 2) 유입량, 3) 기상자료를 주요 영향변수(독립변수)로 고려하고, 이에 a) 지체시간(lag-time), b) 이동평균, c) 유입량의 성분분리조건을 적용하여 총 36가지 시나리오 조합을 ML&DL의 입력자료로 활용하였다. ML&DL 모델은 1) Linear Regression(LR), 2) Lasso, 3) Ridge, 4) SVR(Support Vector Regression), 5) Random Forest(RF), 6) LGBM(Light Gradient Boosting Model), 7) XGBoost의 7가지 ML방법과 8) LSTM(Long Short-Term Memory models), 9) TCN(Temporal Convolutional Network), 10) LSTM-TCN의 3가지 DL 방법, 총 10가지 ML&DL모델을 비교 검토하여 일유입량 예측을 위한 가장 적합한 데이터 조합 특성과 ML&DL모델을 성능평가와 함께 제시하였다. 학습된 모형의 유입량 예측 결과를 비교·분석한 결과, 소양강댐 유역에서는 딥러닝 중에서는 TCN모형이 가장 우수한 성능을 보였고(TCN>TCN-LSTM>LSTM), 트리기반 머신러닝중에서는 Random Forest와 LGBM이 우수한 성능을 보였으며(RF, LGBM>XGB), SVR도 LGBM수준의 우수한 성능을 나타내었다. LR, Lasso, Ridge 세가지 Regression모형은 상대적으로 낮은 성능을 보였다. 또한 소양강댐 댐유입량 예측에 대하여 강우, 유입량, 기상계열을 36가지로 조합한 결과, 입력자료에 lag-time이 적용된 강우계열의 조합 분석에서 세가지 Regression모델을 제외한 모든 모형에서 NSE(Nash-Sutcliffe Efficiency) 0.8이상(최대 0.867)의 성능을 보였으며, lag-time이 적용된 강우와 유입량계열을 조합했을 경우 NSE 0.85이상(최대 0.901)의 더 우수한 성능을 보였다.

  • PDF

Analyzing Key Variables in Network Attack Classification on NSL-KDD Dataset using SHAP (SHAP 기반 NSL-KDD 네트워크 공격 분류의 주요 변수 분석)

  • Sang-duk Lee;Dae-gyu Kim;Chang Soo Kim
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.4
    • /
    • pp.924-935
    • /
    • 2023
  • Purpose: The central aim of this study is to leverage machine learning techniques for the classification of Intrusion Detection System (IDS) data, with a specific focus on identifying the variables responsible for enhancing overall performance. Method: First, we classified 'R2L(Remote to Local)' and 'U2R (User to Root)' attacks in the NSL-KDD dataset, which are difficult to detect due to class imbalance, using seven machine learning models, including Logistic Regression (LR) and K-Nearest Neighbor (KNN). Next, we use the SHapley Additive exPlanation (SHAP) for two classification models that showed high performance, Random Forest (RF) and Light Gradient-Boosting Machine (LGBM), to check the importance of variables that affect classification for each model. Result: In the case of RF, the 'service' variable and in the case of LGBM, the 'dst_host_srv_count' variable were confirmed to be the most important variables. These pivotal variables serve as key factors capable of enhancing performance in the context of classification for each respective model. Conclusion: In conclusion, this paper successfully identifies the optimal models, RF and LGBM, for classifying 'R2L' and 'U2R' attacks, while elucidating the crucial variables associated with each selected model.

Comparison of Machine Learning-Based Greenhouse VPD Prediction Models (머신러닝 기반의 온실 VPD 예측 모델 비교)

  • Jang Kyeong Min;Lee Myeong Bae;Lim Jong Hyun;Oh Han Byeol;Shin Chang Sun;Park Jang Woo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.3
    • /
    • pp.125-132
    • /
    • 2023
  • In this study, we compared the performance of machine learning models for predicting Vapor Pressure Deficits (VPD) in greenhouses that affect pore function and photosynthesis as well as plant growth due to nutrient absorption of plants. For VPD prediction, the correlation between the environmental elements in and outside the greenhouse and the temporal elements of the time series data was confirmed, and how the highly correlated elements affect VPD was confirmed. Before analyzing the performance of the prediction model, the amount and interval of analysis time series data (1 day, 3 days, 7 days) and interval (20 minutes, 1 hour) were checked to adjust the amount and interval of data. Finally, four machine learning prediction models (XGB Regressor, LGBM Regressor, Random Forest Regressor, etc.) were applied to compare the prediction performance by model. As a result of the prediction of the model, when data of 1 day at 20 minute intervals were used, the highest prediction performance was 0.008 for MAE and 0.011 for RMSE in LGBM. In addition, it was confirmed that the factor that most influences VPD prediction after 20 minutes was VPD (VPD_y__71) from the past 20 minutes rather than environmental factors. Using the results of this study, it is possible to increase crop productivity through VPD prediction, condensation of greenhouses, and prevention of disease occurrence. In the future, it can be used not only in predicting environmental data of greenhouses, but also in various fields such as production prediction and smart farm control models.

Generation of Daily High-resolution Sea Surface Temperature for the Seas around the Korean Peninsula Using Multi-satellite Data and Artificial Intelligence (다종 위성자료와 인공지능 기법을 이용한 한반도 주변 해역의 고해상도 해수면온도 자료 생산)

  • Jung, Sihun;Choo, Minki;Im, Jungho;Cho, Dongjin
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_2
    • /
    • pp.707-723
    • /
    • 2022
  • Although satellite-based sea surface temperature (SST) is advantageous for monitoring large areas, spatiotemporal data gaps frequently occur due to various environmental or mechanical causes. Thus, it is crucial to fill in the gaps to maximize its usability. In this study, daily SST composite fields with a resolution of 4 km were produced through a two-step machine learning approach using polar-orbiting and geostationary satellite SST data. The first step was SST reconstruction based on Data Interpolate Convolutional AutoEncoder (DINCAE) using multi-satellite-derived SST data. The second step improved the reconstructed SST targeting in situ measurements based on light gradient boosting machine (LGBM) to finally produce daily SST composite fields. The DINCAE model was validated using random masks for 50 days, whereas the LGBM model was evaluated using leave-one-year-out cross-validation (LOYOCV). The SST reconstruction accuracy was high, resulting in R2 of 0.98, and a root-mean-square-error (RMSE) of 0.97℃. The accuracy increase by the second step was also high when compared to in situ measurements, resulting in an RMSE decrease of 0.21-0.29℃ and an MAE decrease of 0.17-0.24℃. The SST composite fields generated using all in situ data in this study were comparable with the existing data assimilated SST composite fields. In addition, the LGBM model in the second step greatly reduced the overfitting, which was reported as a limitation in the previous study that used random forest. The spatial distribution of the corrected SST was similar to those of existing high resolution SST composite fields, revealing that spatial details of oceanic phenomena such as fronts, eddies and SST gradients were well simulated. This research demonstrated the potential to produce high resolution seamless SST composite fields using multi-satellite data and artificial intelligence.

Classification models for chemotherapy recommendation using LGBM for the patients with colorectal cancer

  • Oh, Seo-Hyun;Baek, Jeong-Heum;Kang, Un-Gu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.7
    • /
    • pp.9-17
    • /
    • 2021
  • In this study, we propose a part of the CDSS(Clinical Decision Support System) study, a system that can classify chemotherapy, one of the treatment methods for colorectal cancer patients. In the treatment of colorectal cancer, the selection of chemotherapy according to the patient's condition is very important because it is directly related to the patient's survival period. Therefore, in this study, chemotherapy was classified using a machine learning algorithm by creating a baseline model, a pathological model, and a combined model using both characteristics of the patient using the individual and pathological characteristics of colorectal cancer patients. As a result of comparing the prediction accuracy with Top-n Accuracy, ROC curve, and AUC, it was found that the combined model showed the best prediction accuracy, and that the LGBM algorithm had the best performance. In this study, a chemotherapy classification model suitable for the patient's condition was constructed by classifying the model by patient characteristics using a machine learning algorithm. Based on the results of this study in future studies, it will be helpful for CDSS research by creating a better performing chemotherapy classification model.

Water level forecasting for extended lead times using preprocessed data with variational mode decomposition: A case study in Bangladesh

  • Shabbir Ahmed Osmani;Roya Narimani;Hoyoung Cha;Changhyun Jun;Md Asaduzzaman Sayef
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.179-179
    • /
    • 2023
  • This study suggests a new approach of water level forecasting for extended lead times using original data preprocessing with variational mode decomposition (VMD). Here, two machine learning algorithms including light gradient boosting machine (LGBM) and random forest (RF) were considered to incorporate extended lead times (i.e., 5, 10, 15, 20, 25, 30, 40, and 50 days) forecasting of water levels. At first, the original data at two water level stations (i.e., SW173 and SW269 in Bangladesh) and their decomposed data from VMD were prepared on antecedent lag times to analyze in the datasets of different lead times. Mean absolute error (MAE), root mean squared error (RMSE), and mean squared error (MSE) were used to evaluate the performance of the machine learning models in water level forecasting. As results, it represents that the errors were minimized when the decomposed datasets were considered to predict water levels, rather than the use of original data standalone. It was also noted that LGBM produced lower MAE, RMSE, and MSE values than RF, indicating better performance. For instance, at the SW173 station, LGBM outperformed RF in both decomposed and original data with MAE values of 0.511 and 1.566, compared to RF's MAE values of 0.719 and 1.644, respectively, in a 30-day lead time. The models' performance decreased with increasing lead time, as per the study findings. In summary, preprocessing original data and utilizing machine learning models with decomposed techniques have shown promising results for water level forecasting in higher lead times. It is expected that the approach of this study can assist water management authorities in taking precautionary measures based on forecasted water levels, which is crucial for sustainable water resource utilization.

  • PDF

Retrieval of Hourly Aerosol Optical Depth Using Top-of-Atmosphere Reflectance from GOCI-II and Machine Learning over South Korea (GOCI-II 대기상한 반사도와 기계학습을 이용한 남한 지역 시간별 에어로졸 광학 두께 산출)

  • Seyoung Yang;Hyunyoung Choi;Jungho Im
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.933-948
    • /
    • 2023
  • Atmospheric aerosols not only have adverse effects on human health but also exert direct and indirect impacts on the climate system. Consequently, it is imperative to comprehend the characteristics and spatiotemporal distribution of aerosols. Numerous research endeavors have been undertaken to monitor aerosols, predominantly through the retrieval of aerosol optical depth (AOD) via satellite-based observations. Nonetheless, this approach primarily relies on a look-up table-based inversion algorithm, characterized by computationally intensive operations and associated uncertainties. In this study, a novel high-resolution AOD direct retrieval algorithm, leveraging machine learning, was developed using top-of-atmosphere reflectance data derived from the Geostationary Ocean Color Imager-II (GOCI-II), in conjunction with their differences from the past 30-day minimum reflectance, and meteorological variables from numerical models. The Light Gradient Boosting Machine (LGBM) technique was harnessed, and the resultant estimates underwent rigorous validation encompassing random, temporal, and spatial N-fold cross-validation (CV) using ground-based observation data from Aerosol Robotic Network (AERONET) AOD. The three CV results consistently demonstrated robust performance, yielding R2=0.70-0.80, RMSE=0.08-0.09, and within the expected error (EE) of 75.2-85.1%. The Shapley Additive exPlanations(SHAP) analysis confirmed the substantial influence of reflectance-related variables on AOD estimation. A comprehensive examination of the spatiotemporal distribution of AOD in Seoul and Ulsan revealed that the developed LGBM model yielded results that are in close concordance with AERONET AOD over time, thereby confirming its suitability for AOD retrieval at high spatiotemporal resolution (i.e., hourly, 250 m). Furthermore, upon comparing data coverage, it was ascertained that the LGBM model enhanced data retrieval frequency by approximately 8.8% in comparison to the GOCI-II L2 AOD products, ameliorating issues associated with excessive masking over very illuminated surfaces that are often encountered in physics-based AOD retrieval processes.