• 제목/요약/키워드: XGB model

검색결과 34건 처리시간 0.018초

ConvXGB: A new deep learning model for classification problems based on CNN and XGBoost

  • Thongsuwan, Setthanun;Jaiyen, Saichon;Padcharoen, Anantachai;Agarwal, Praveen
    • Nuclear Engineering and Technology
    • /
    • 제53권2호
    • /
    • pp.522-531
    • /
    • 2021
  • We describe a new deep learning model - Convolutional eXtreme Gradient Boosting (ConvXGB) for classification problems based on convolutional neural nets and Chen et al.'s XGBoost. As well as image data, ConvXGB also supports the general classification problems, with a data preprocessing module. ConvXGB consists of several stacked convolutional layers to learn the features of the input and is able to learn features automatically, followed by XGBoost in the last layer for predicting the class labels. The ConvXGB model is simplified by reducing the number of parameters under appropriate conditions, since it is not necessary re-adjust the weight values in a back propagation cycle. Experiments on several data sets from UCL Repository, including images and general data sets, showed that our model handled the classification problems, for all the tested data sets, slightly better than CNN and XGBoost alone and was sometimes significantly better.

불균형 데이터 처리를 통한 머신러닝 기반 TBM 굴진율 이상탐지 개선 (Enhancing machine learning-based anomaly detection for TBM penetration rate with imbalanced data manipulation)

  • 권기범;황병현;박현태;오주영;최항석
    • 한국터널지하공간학회 논문집
    • /
    • 제26권5호
    • /
    • pp.519-532
    • /
    • 2024
  • TBM (tunnel boring machine) 터널 프로젝트의 리스크 관리 측면에서 굴진율 예측은 중요하며, 이를 위한 머신러닝 기반 TBM 굴진율 예측 연구가 지속적으로 진행되어 왔다. 그러나, 기존 연구의 머신러닝 예측 모델은 정상 굴진율과 이상 굴진율 간의 불균형 데이터를 고려하는 데 한계가 있다. 본 연구에서는 데이터 증강 기법을 통해 불균형 데이터를 처리하여 머신러닝 기반 TBM 굴진율 이상탐지 성능을 개선하였다. 먼저, 상관관계 분석을 통해 유사 변수를 제거하여 6가지 입력특성을 선정하였다. 또한, 하위 10%와 상위 10%의 굴진율을 각각 이상 등급으로, 그 외 범위의 굴진율을 정상 등급으로 굴진율 등급을 구분하였다. 기존 학습 데이터와 SMOTE (synthetic minority oversampling technique)를 통해 증강된 학습 데이터를 각각 XGB (extreme gradient boosting)에 적용한 XGB 모델과 XGB-SMOTE 모델을 구축하였다. 굴진율 등급 예측 성능을 비교한 결과, XGB 모델은 정상 굴진율에 대한 예측 성능은 우수하나 이상 굴진율 예측 성능은 상대적으로 낮게 도출되었다. 반면, XGB-SMOTE 모델은 모든 굴진율 등급에서 일관되게 우수한 예측 성능을 보였다. 이는 SMOTE를 통한 이상 굴진율 데이터의 증강이 이상 굴진율을 유발하는 지반조건과 TBM 운영인자 간의 패턴 학습 수준을 향상시켰기 때문으로 판단된다. 결론적으로, 본 연구는 머신러닝 기반 TBM 굴진율 이상탐지 시 데이터 증강 기법을 활용한 불균형 데이터 처리가 효과적임을 보여준다.

Estimation of the mechanical properties of oil palm shell aggregate concrete by novel AO-XGB model

  • Yipeng Feng;Jiang Jie;Amir Toulabi
    • Steel and Composite Structures
    • /
    • 제49권6호
    • /
    • pp.645-666
    • /
    • 2023
  • Due to the steadily declining supply of natural coarse aggregates, the concrete industry has shifted to substituting coarse aggregates generated from byproducts and industrial waste. Oil palm shell is a substantial waste product created during the production of palm oil (OPS). When considering the usage of OPSC, building engineers must consider its uniaxial compressive strength (UCS). Obtaining UCS is expensive and time-consuming, machine learning may help. This research established five innovative hybrid AI algorithms to predict UCS. Aquila optimizer (AO) is used with methods to discover optimum model parameters. Considered models are artificial neural network (AO - ANN), adaptive neuro-fuzzy inference system (AO - ANFIS), support vector regression (AO - SVR), random forest (AO - RF), and extreme gradient boosting (AO - XGB). To achieve this goal, a dataset of OPS-produced concrete specimens was compiled. The outputs depict that all five developed models have justifiable accuracy in UCS estimation process, showing the remarkable correlation between measured and estimated UCS and models' usefulness. All in all, findings depict that the proposed AO - XGB model performed more suitable than others in predicting UCS of OPSC (with R2, RMSE, MAE, VAF and A15-index at 0.9678, 1.4595, 1.1527, 97.6469, and 0.9077). The proposed model could be utilized in construction engineering to ensure enough mechanical workability of lightweight concrete and permit its safe usage for construction aims.

앙상블 기반 모델을 이용한 서울시 PM2.5 농도 예측 및 분석 (Prediction and Analysis of PM2.5 Concentration in Seoul Using Ensemble-based Model)

  • 류민지;손상훈;김진수
    • 대한원격탐사학회지
    • /
    • 제38권6_1호
    • /
    • pp.1191-1205
    • /
    • 2022
  • 복잡하고 광범위한 원인을 가진 대기오염물질 중 particulate matter (PM)은 입자의 크기에 따라 분류된다. 그 중 PM2.5는 그 크기가 매우 작아 사람이 흡입하면 인간의 호흡기나 심혈관에 질병을 유발할 수 있다. 이러한 위험에 대비하기 위해서는 국가 중심의 관리와 사전에 예방할 수 있는 모니터링 및 예측이 중요하다. 본 연구는 고농도 미세먼지의 발생이 잦은 서울시의 PM2.5를 local data assimilation and prediction system (LDAPS) 기상 관련 인자 15가지와 aerosol optical depth (AOD), 화학인자 4가지를 독립변수로 하여 앙상블 모델 두 가지 random forest (RF)와 extreme gradient boosting (XGB)로 예측하고자 하였다. 예측에 사용된 두 모델의 성능 평가와 인자 중요도 평가를 수행하였으며, 계절별 모델 분석도 수행하였다. 예측 정확도 결과, RF가 R2 = 0.85, XGB가 R2 = 0.91의 높은 예측 정확도를 보이며 XGB가 RF보다 PM2.5 예측에 적합한 모델임을 확인하였다. 계절별 모델 분석 결과, 봄에 농도가 높은 관측 값과 비교하여 예측 수행이 잘 되었다고 할 수 있다. 본 연구는 다양한 인자를 이용하여 서울시의 PM2.5를 예측하였고, 좋은 성능을 보이는 앙상블 기반의 PM2.5 예측 모델을 구축하였다.

Ensemble deep learning-based models to predict the resilient modulus of modified base materials subjected to wet-dry cycles

  • Mahzad Esmaeili-Falak;Reza Sarkhani Benemaran
    • Geomechanics and Engineering
    • /
    • 제32권6호
    • /
    • pp.583-600
    • /
    • 2023
  • The resilient modulus (MR) of various pavement materials plays a significant role in the pavement design by a mechanistic-empirical method. The MR determination is done by experimental tests that need time and money, along with special experimental tools. The present paper suggested a novel hybridized extreme gradient boosting (XGB) structure for forecasting the MR of modified base materials subject to wet-dry cycles. The models were created by various combinations of input variables called deep learning. Input variables consist of the number of W-D cycles (WDC), the ratio of free lime to SAF (CSAFR), the ratio of maximum dry density to the optimum moisture content (DMR), confining pressure (σ3), and deviatoric stress (σd). Two XGB structures were produced for the estimation aims, where determinative variables were optimized by particle swarm optimization (PSO) and black widow optimization algorithm (BWOA). According to the results' description and outputs of Taylor diagram, M1 model with the combination of WDC, CSAFR, DMR, σ3, and σd is recognized as the most suitable model, with R2 and RMSE values of BWOA-XGB for model M1 equal to 0.9991 and 55.19 MPa, respectively. Interestingly, the lowest value of RMSE for literature was at 116.94 MPa, while this study could gain the extremely lower RMSE owned by BWOA-XGB model at 55.198 MPa. At last, the explanations indicate the BWO algorithm's capability in determining the optimal value of XGB determinative parameters in MR prediction procedure.

수질자료의 특성을 고려한 앙상블 머신러닝 모형 구축 및 설명가능한 인공지능을 이용한 모형결과 해석에 대한 연구 (Development of ensemble machine learning model considering the characteristics of input variables and the interpretation of model performance using explainable artificial intelligence)

  • 박정수
    • 상하수도학회지
    • /
    • 제36권4호
    • /
    • pp.239-248
    • /
    • 2022
  • The prediction of algal bloom is an important field of study in algal bloom management, and chlorophyll-a concentration(Chl-a) is commonly used to represent the status of algal bloom. In, recent years advanced machine learning algorithms are increasingly used for the prediction of algal bloom. In this study, XGBoost(XGB), an ensemble machine learning algorithm, was used to develop a model to predict Chl-a in a reservoir. The daily observation of water quality data and climate data was used for the training and testing of the model. In the first step of the study, the input variables were clustered into two groups(low and high value groups) based on the observed value of water temperature(TEMP), total organic carbon concentration(TOC), total nitrogen concentration(TN) and total phosphorus concentration(TP). For each of the four water quality items, two XGB models were developed using only the data in each clustered group(Model 1). The results were compared to the prediction of an XGB model developed by using the entire data before clustering(Model 2). The model performance was evaluated using three indices including root mean squared error-observation standard deviation ratio(RSR). The model performance was improved using Model 1 for TEMP, TN, TP as the RSR of each model was 0.503, 0.477 and 0.493, respectively, while the RSR of Model 2 was 0.521. On the other hand, Model 2 shows better performance than Model 1 for TOC, where the RSR was 0.532. Explainable artificial intelligence(XAI) is an ongoing field of research in machine learning study. Shapley value analysis, a novel XAI algorithm, was also used for the quantitative interpretation of the XGB model performance developed in this study.

Prediction of Dissolved Oxygen at Anyang-stream using XG-Boost and Artificial Neural Networks

  • Keun Young Lee;Bomchul Kim;Gwanghyun Jo
    • Journal of information and communication convergence engineering
    • /
    • 제22권2호
    • /
    • pp.133-138
    • /
    • 2024
  • Dissolved oxygen (DO) is an important factor in ecosystems. However, the analysis of DO is frequently rather complicated because of the nonlinear phenomenon of the river system. Therefore, a convenient model-free algorithm for DO variable is required. In this study, a data-driven algorithm for predicting DO was developed by combining XGBoost and an artificial neural network (ANN), called ANN-XGB. To train the model, two years of ecosystem data were collected in Anyang, Seoul using the Troll 9500 model. One advantage of the proposed algorithm is its ability to capture abrupt changes in climate-related features that arise from sudden events. Moreover, our algorithm can provide a feature importance analysis owing to the use of XGBoost. The results obtained using the ANN-XGB algorithm were compared with those obtained using the ANN algorithm in the Results Section. The predictions made by ANN-XGB were mostly in closer agreement with the measured DO values in the river than those made by the ANN.

Damage identification in suspension bridges under earthquake excitation using practical advanced analysis and hybrid machine-learning models

  • Van-Thanh Pham;Duc-Kien Thai;Seung-Eock Kim
    • Steel and Composite Structures
    • /
    • 제52권6호
    • /
    • pp.695-711
    • /
    • 2024
  • Suspension bridges are critical to urban transportation, but those in earthquake-prone areas face unique challenges. In the event of a moderate or strong earthquake, conventional linear theory-based approaches for detecting bridge damage become inadequate. This study presents an efficient method for identifying damage in suspension bridges using time history nonlinear inelastic analysis. A practical advanced analysis program is employed to model cable-supported bridges with low computational cost, generating a dataset for four hybrid models: PSO-DT, PSO-RF, PSO-XGB, and PSO-CGB. These models combine decision tree (DT), random forest (RF), extreme gradient boosting (XGB), and categorical gradient boosting (CGB) with particle swarm optimization (PSO) to capture nonlinear correlations between displacement response and damage. Principal component analysis reduces dataset dimensions, and PSO selects the optimal model. A numerical case study of a suspension bridge under simulated earthquake conditions identifies PSO-XGB as the best model for predicting stiffness reduction. The results demonstrate the method's robustness for nonlinear damage detection in suspension bridges under earthquake excitation.

입력자료 군집화에 따른 앙상블 머신러닝 모형의 수질예측 특성 연구 (The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction)

  • 박정수
    • 한국물환경학회지
    • /
    • 제37권5호
    • /
    • pp.335-343
    • /
    • 2021
  • Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-ob servation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.

머신러닝을 사용한 서리 예측 연구 (A study on frost prediction model using machine learning)

  • 김효정;김삼용
    • 응용통계연구
    • /
    • 제35권4호
    • /
    • pp.543-552
    • /
    • 2022
  • 서리는 표면 근처의 공기의 이슬점 온도가 빙점 이하일 때 수증기가 승화, 응축되어 땅이나 물체에 얼게 되는 작은 얼음 결정체이다. 서리가 내리면 농작물이 직접 피해를 입는다. 농작물이 낮은 온도에 접촉하면 조직이 얼어서 세포막이나 엽록체가 딱딱해지고 파괴되거나 건조한 세포가 죽습니다. 2020년 7월, 세계 최대 커피 생산국인 브라질 미나스제라이스 주에 갑작스러운 영하의 날씨와 서리가 내려 지역 커피 나무의 약 30%가 피해를 입었다. 이로 인해 피해로 커피값이 크게 올랐고, 피해가 심각한 농가는 농작물이 회복되기까지 3년이 걸리기 때문에 2024년에야 커피를 생산할 수 있다. 본 논문에서는 심한 서리가 내리는 것을 방지하기 위해 기상청이 제공하는 서리 발생 데이터와 기상관측 데이터를 이용해 서리를 예측하려고 했다. 관측 지점의 고도 및 풍속, 온도, 습도, 강수량, 흐림 등의 기상 요인을 반영하여 모델을 구축하였다. XGB, SVM, Random Forest, MLP 모델을 사용하여 다양한 하이퍼 파라미터를 학습 데이터로 적용하여 각 모델에 가장 적합한 모델을 선택하였다. 마지막으로, 결과는 테스트 데이터에서 정확도(acc)와 중요 성공 지수(CSI)로 평가되었다. XGB는 90.4%의 acc와 64.4%의 CSI로 다른 모델에 비해 최고의 모델이었고, SVM은 89.7%의 acc와 61.2%의 CSI로 그 뒤를 이었다. 랜덤 포레스트와 MLP는 약 89%의 acc와 약 60%의 CSI로 비슷한 성능을 보였다.