• 제목/요약/키워드: XGBoost Algorithm

검색결과 61건 처리시간 0.018초

Estimating pile setup parameter using XGBoost-based optimized models

  • Xigang Du;Ximeng Ma;Chenxi Dong;Mehrdad Sattari Nikkhoo
    • Geomechanics and Engineering
    • /
    • 제36권3호
    • /
    • pp.259-276
    • /
    • 2024
  • The undrained shear strength is widely acknowledged as a fundamental mechanical property of soil and is considered a critical engineering parameter. In recent years, researchers have employed various methodologies to evaluate the shear strength of soil under undrained conditions. These methods encompass both numerical analyses and empirical techniques, such as the cone penetration test (CPT), to gain insights into the properties and behavior of soil. However, several of these methods rely on correlation assumptions, which can lead to inconsistent accuracy and precision. The study involved the development of innovative methods using extreme gradient boosting (XGB) to predict the pile set-up component "A" based on two distinct data sets. The first data set includes average modified cone point bearing capacity (qt), average wall friction (fs), and effective vertical stress (σvo), while the second data set comprises plasticity index (PI), soil undrained shear cohesion (Su), and the over consolidation ratio (OCR). These data sets were utilized to develop XGBoost-based methods for predicting the pile set-up component "A". To optimize the internal hyperparameters of the XGBoost model, four optimization algorithms were employed: Particle Swarm Optimization (PSO), Social Spider Optimization (SSO), Arithmetic Optimization Algorithm (AOA), and Sine Cosine Optimization Algorithm (SCOA). The results from the first data set indicate that the XGBoost model optimized using the Arithmetic Optimization Algorithm (XGB - AOA) achieved the highest accuracy, with R2 values of 0.9962 for the training part and 0.9807 for the testing part. The performance of the developed models was further evaluated using the RMSE, MAE, and VAF indices. The results revealed that the XGBoost model optimized using XGBoost - AOA outperformed other models in terms of accuracy, with RMSE, MAE, and VAF values of 0.0078, 0.0015, and 99.6189 for the training part and 0.0141, 0.0112, and 98.0394 for the testing part, respectively. These findings suggest that XGBoost - AOA is the most accurate model for predicting the pile set-up component.

Prediction of Dissolved Oxygen at Anyang-stream using XG-Boost and Artificial Neural Networks

  • Keun Young Lee;Bomchul Kim;Gwanghyun Jo
    • Journal of information and communication convergence engineering
    • /
    • 제22권2호
    • /
    • pp.133-138
    • /
    • 2024
  • Dissolved oxygen (DO) is an important factor in ecosystems. However, the analysis of DO is frequently rather complicated because of the nonlinear phenomenon of the river system. Therefore, a convenient model-free algorithm for DO variable is required. In this study, a data-driven algorithm for predicting DO was developed by combining XGBoost and an artificial neural network (ANN), called ANN-XGB. To train the model, two years of ecosystem data were collected in Anyang, Seoul using the Troll 9500 model. One advantage of the proposed algorithm is its ability to capture abrupt changes in climate-related features that arise from sudden events. Moreover, our algorithm can provide a feature importance analysis owing to the use of XGBoost. The results obtained using the ANN-XGB algorithm were compared with those obtained using the ANN algorithm in the Results Section. The predictions made by ANN-XGB were mostly in closer agreement with the measured DO values in the river than those made by the ANN.

Comparative Analysis of Machine Learning Techniques for IoT Anomaly Detection Using the NSL-KDD Dataset

  • Zaryn, Good;Waleed, Farag;Xin-Wen, Wu;Soundararajan, Ezekiel;Maria, Balega;Franklin, May;Alicia, Deak
    • International Journal of Computer Science & Network Security
    • /
    • 제23권1호
    • /
    • pp.46-52
    • /
    • 2023
  • With billions of IoT (Internet of Things) devices populating various emerging applications across the world, detecting anomalies on these devices has become incredibly important. Advanced Intrusion Detection Systems (IDS) are trained to detect abnormal network traffic, and Machine Learning (ML) algorithms are used to create detection models. In this paper, the NSL-KDD dataset was adopted to comparatively study the performance and efficiency of IoT anomaly detection models. The dataset was developed for various research purposes and is especially useful for anomaly detection. This data was used with typical machine learning algorithms including eXtreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), and Deep Convolutional Neural Networks (DCNN) to identify and classify any anomalies present within the IoT applications. Our research results show that the XGBoost algorithm outperformed both the SVM and DCNN algorithms achieving the highest accuracy. In our research, each algorithm was assessed based on accuracy, precision, recall, and F1 score. Furthermore, we obtained interesting results on the execution time taken for each algorithm when running the anomaly detection. Precisely, the XGBoost algorithm was 425.53% faster when compared to the SVM algorithm and 2,075.49% faster than the DCNN algorithm. According to our experimental testing, XGBoost is the most accurate and efficient method.

A Design and Implement of Efficient Agricultural Product Price Prediction Model

  • Im, Jung-Ju;Kim, Tae-Wan;Lim, Ji-Seoup;Kim, Jun-Ho;Yoo, Tae-Yong;Lee, Won Joo
    • 한국컴퓨터정보학회논문지
    • /
    • 제27권5호
    • /
    • pp.29-36
    • /
    • 2022
  • 본 논문에서는 DACON에서 제공하는 데이터셋을 기반으로 한 효과적인 농산물 가격 예측 모델을 제안한다. 이 모델은 XGBoost와 CatBoost 이며 Gradient Boosting 계열의 알고리즘으로써 기존의 Logistic Regression과 Random Forest보다 평균정확도 및 수행시간이 우수하다. 이러한 장점들을 기반으로 농산물의 이전 가격들을 기반으로 1주, 2주, 4주뒤 가격을 예측하는 머신러닝 모델을 설계한다. XGBoost 모델은 회귀 방식의 모델링인 XGBoost Regressor 라이브러리를 사용하여 하이퍼 파라미터를 조정함으로써 가장 우수한 성능을 도출할 수 있다. CatBoost 모델은 CatBoost Regressor를 사용하여 모델을 구현한다. 구현한 모델은 DACON에서 제공하는 API를 이용하여 검증하고, 모델 별 성능평가를 실시한다. XGBoost는 자체적인 과적합 규제를 진행하기 때문에 적은 데이터셋에도 불구하고 우수한 성능을 도출하지만, 학습시간, 예측시간 등 시간적인 성능 면에서는 LGBM보다 성능이 낮다는 것을 알 수 있었다.

기후변화에 따른 과수작물 재배지 변화 예측 연구: 한라봉을 중심으로 (Research on predicting changes in crop cultivation areas due to climate change: Focusing on Hallabong)

  • 박혜은;이종태
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제33권1호
    • /
    • pp.31-44
    • /
    • 2024
  • Purpose The purpose of this study is to use climate data to find the algorithm with the highest Hallabong production prediction ability and to predict future Hallabong production in areas where Hallabong cultivation is expected to be possible. Design/methodology/approach The research is conducted in two stages. In the first step, find the algorithm with the highest predictive power among XGBoost, Random Forest, SVM, and LSTM methodologies. In the second stage, the algorithm found in the first stage is applied to predict future Hallabong production in three regions where Hallabong production is expected to be possible. Findings As with many prediction studies, we found that XGBoost showed the highest prediction power. Even in areas where Hallabong production is expected to be possible, Hallabong production was predicted to be highest in Hongcheon, Gangwon-do, which has the highest latitude.

XGBoost를 이용한 타지키스탄 일사량 예측 모델 (Modeling Solar Irradiance in Tajikistan with XGBoost Algorithm)

  • 노정두;나태유;강성승
    • 지질공학
    • /
    • 제33권3호
    • /
    • pp.403-411
    • /
    • 2023
  • 본 연구는 XGBoost를 이용하여 타지키스탄의 일사량을 예측하여 타지키스탄의 재생에너지 자원으로서 복사 태양에너지의 활용 가능성을 평가하기 위함이다. 첫째, 타지키스탄의 일사량을 훈련모델, 검증모델, 시험모델을 통해 예측한 결과, 시간과 계절에 따른 일사량의 계절성이 실제값과 예측값 모두에서 뚜렷하게 구분되는 것을 확인하였다. 둘째, 타지키스탄의 2016, 2017, 2018, 2019년 등 각 연도의 7월 1일 시간당 일사량의 실제값과 예측값을 계산한 결과, 2016년 일사량의 최대 실제값과 예측값은 약 1,005 W/m2과 1,009 W/m2, 2017년에는 939 W/m2과 997 W/m2, 2018년에는 1,022 W/m2과 1,012 W/m2, 2019년에는 1,055 W/m2과 1,019 W/m2으로 나타났으며, 실제값과 예측값의 오차가 약 0.4~5.8%로 매우 비슷한 결과를 보였다. 결과적으로 타지키스탄의 일사량을 예측하여 복사 태양에너지의 활용 가능성을 평가하는 데 있어 XGBoost가 매우 유용한 도구로 활용될 수 있을 것으로 판단된다.

An advanced machine learning technique to predict compressive strength of green concrete incorporating waste foundry sand

  • Danial Jahed Armaghani;Haleh Rasekh;Panagiotis G. Asteris
    • Computers and Concrete
    • /
    • 제33권1호
    • /
    • pp.77-90
    • /
    • 2024
  • Waste foundry sand (WFS) is the waste product that cause environmental hazards. WFS can be used as a partial replacement of cement or fine aggregates in concrete. A database comprising 234 compressive strength tests of concrete fabricated with WFS is used. To construct the machine learning-based prediction models, the water-to-cement ratio, WFS replacement percentage, WFS-to-cement content ratio, and fineness modulus of WFS were considered as the model's inputs, and the compressive strength of concrete is set as the model's output. A base extreme gradient boosting (XGBoost) model together with two hybrid XGBoost models mixed with the tunicate swarm algorithm (TSA) and the salp swarm algorithm (SSA) were applied. The role of TSA and SSA is to identify the optimum values of XGBoost hyperparameters to obtain the higher performance. The results of these hybrid techniques were compared with the results of the base XGBoost model in order to investigate and justify the implementation of optimisation algorithms. The results showed that the hybrid XGBoost models are faster and more accurate compared to the base XGBoost technique. The XGBoost-SSA model shows superior performance compared to previously published works in the literature, offering a reduced system error rate. Although the WFS-to-cement ratio is significant, the WFS replacement percentage has a smaller influence on the compressive strength of concrete. To improve the compressive strength of concrete fabricated with WFS, the simultaneous consideration of the water-to-cement ratio and fineness modulus of WFS is recommended.

Development of Big Data-based Cardiovascular Disease Prediction Analysis Algorithm

  • Kyung-A KIM;Dong-Hun HAN;Myung-Ae CHUNG
    • 한국인공지능학회지
    • /
    • 제11권3호
    • /
    • pp.29-34
    • /
    • 2023
  • Recently, the rapid development of artificial intelligence technology, many studies are being conducted to predict the risk of heart disease in order to lower the mortality rate of cardiovascular diseases worldwide. This study presents exercise or dietary improvement contents in the form of a software app or web to patients with cardiovascular disease, and cardiovascular disease through digital devices such as mobile phones and PCs. LR, LDA, SVM, XGBoost for the purpose of developing "Life style Improvement Contents (Digital Therapy)" for cardiovascular disease care to help with management or treatment We compared and analyzed cardiovascular disease prediction models using machine learning algorithms. Research Results XGBoost. The algorithm model showed the best predictive model performance with overall accuracy of 80% before and after. Overall, accuracy was 80.0%, F1 Score was 0.77~0.79, and ROC-AUC was 80%~84%, resulting in predictive model performance. Therefore, it was found that the algorithm used in this study can be used as a reference model necessary to verify the validity and accuracy of cardiovascular disease prediction. A cardiovascular disease prediction analysis algorithm that can enter accurate biometric data collected in future clinical trials, add lifestyle management (exercise, eating habits, etc.) elements, and verify the effect and efficacy on cardiovascular-related bio-signals and disease risk. development, ultimately suggesting that it is possible to develop lifestyle improvement contents (Digital Therapy).

토지 보상비 추정 모델 개발 - 건설CALS데이터와 공공데이터 중심으로 (Development of Land Compensation Cost Estimation Model : The Use of the Construction CALS Data and Linked Open Data)

  • 이상규;김진욱;서명배
    • 한국컴퓨터정보학회:학술대회논문집
    • /
    • 한국컴퓨터정보학회 2020년도 제62차 하계학술대회논문집 28권2호
    • /
    • pp.375-378
    • /
    • 2020
  • 본 연구는 토지 보상비의 추정 모델 개발을 위해서 건설 CALS (Continuous Acquisition & Life-cycle Support) 시스템의 내부데이터와 개별공시지가 및 표준지 공시지가 등의 외부데이터, 그리고 개발된 추정 모델의 고도화를 위한 개별공시가 데이터를 기반으로 생성된 데이터를 활용하였다. 이렇게 수집된 3가지 유형의 데이터를 분석하기 위해서 기존 선형 모델 또는 의사결정나무 (Tree) 기반의 모델상 과적합 오류를 제거할 경우 매우 유용한 알고리즘으로 Decision Tree 기반의 Xgboost 알고리즘을 데이터 분석 방법론으로 토지 보상비 추정 모델 개발에 활용하였다. Xgboost 알고리즘의 고도화를 위해 하이퍼파라미터 튜닝을 적용한 결과, 실제 보상비와 개발된 보상비 추정 모델의 MAPE(Mean Absolute Percentage Error) 범위는 19.5%로 확인하였다.

  • PDF

Selecting Optimal Algorithms for Stroke Prediction: Machine Learning-Based Approach

  • Kyung Tae CHOI;Kyung-A KIM;Myung-Ae CHUNG;Min Soo KANG
    • 한국인공지능학회지
    • /
    • 제12권2호
    • /
    • pp.1-7
    • /
    • 2024
  • In this paper, we compare three models (logistic regression, Random Forest, and XGBoost) for predicting stroke occurrence using data from the Korea National Health and Nutrition Examination Survey (KNHANES). We evaluated these models using various metrics, focusing mainly on recall and F1 score to assess their performance. Initially, the logistic regression model showed a satisfactory recall score among the three models; however, it was excluded from further consideration because it did not meet the F1 score threshold, which was set at a minimum of 0.5. The F1 score is crucial as it considers both precision and recall, providing a balanced measure of a model's accuracy. Among the models that met the criteria, XGBoost showed the highest recall rate and showed excellent performance in stroke prediction. In particular, XGBoost shows strong performance not only in recall, but also in F1 score and AUC, so it should be considered the optimal algorithm for predicting stroke occurrence. This study determines that the performance of XGBoost is optimal in the field of stroke prediction.