• Title/Summary/Keyword: XGboost

Search Result 244, Processing Time 0.022 seconds

A Study on the Prediction of Fuel Consumption of Bulk Ship Main Engine Using Explainable Artificial Intelligence (SHAP을 활용한 벌크선 메인엔진 연료 소모량 예측연구)

  • Hyun-Ju Kim;Min-Gyu Park;Ji-Hwan Lee
    • Journal of Navigation and Port Research
    • /
    • v.47 no.4
    • /
    • pp.182-190
    • /
    • 2023
  • This study proposes a predictive model using XGBoost and SHapley Additive exPlanation (SHAP) to estimate fuel consumption in bulk carriers. Previous studies have also utilized ship engine data and weather data. However, they lacked reliability in predicted results and explanations of variables used in the fuel consumption prediction model implementation. To address these limitations, this study developed a predictive model using XGBoost and SHAP. It provides research background, scope, relevant regulations, previous studies, and research methodology. Additionally, it explains the data cleaning method for bulk carriers and verifies results of the predictive model.

Indoor positioning system using Xgboosting (Xgboosting 기법을 이용한 실내 위치 측위 기법)

  • Hwang, Chi-Gon;Yoon, Chang-Pyo;Kim, Dae-Jin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.492-494
    • /
    • 2021
  • The decision tree technique is used as a classification technique in machine learning. However, the decision tree has a problem of consuming a lot of speed or resources due to the problem of overfitting. To solve this problem, there are bagging and boosting techniques. Bagging creates multiple samplings and models them using them, and boosting models the sampled data and adjusts weights to reduce overfitting. In addition, recently, techniques Xgboost have been introduced to improve performance. Therefore, in this paper, we collect wifi signal data for indoor positioning, apply it to the existing method and Xgboost, and perform performance evaluation through it.

  • PDF

Cost-Sensitive Learning for Cardio-Cerebrovascular Disease Risk Prediction (심혈관질환 위험 예측을 위한 비용민감 학습 모델)

  • Yu Na Lee;Kyung-Hee Lee;Wan-Sup Cho
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.161-168
    • /
    • 2021
  • In this study, we propose a cardiovascular disease prediction model using machine learning. First, a multidimensional analysis of various differences between the two groups is performed and the results are visualized. In particular, we propose a predictive model using cost-sensitive learning that can improve the sensitivity for cases where there is a high class imbalance between the normal and patient groups, such as diseases. In this study, a predictive model is developed using CART and XGBoost, which are representative machine learning technologies, and prediction and performance are compared for cardiovascular disease patient data. According to the study results, CART showed higher accuracy and specificity than XGBoost, and the accuracy was about 70% to 74%.

Selecting Optimal Algorithms for Stroke Prediction: Machine Learning-Based Approach

  • Kyung Tae CHOI;Kyung-A KIM;Myung-Ae CHUNG;Min Soo KANG
    • Korean Journal of Artificial Intelligence
    • /
    • v.12 no.2
    • /
    • pp.1-7
    • /
    • 2024
  • In this paper, we compare three models (logistic regression, Random Forest, and XGBoost) for predicting stroke occurrence using data from the Korea National Health and Nutrition Examination Survey (KNHANES). We evaluated these models using various metrics, focusing mainly on recall and F1 score to assess their performance. Initially, the logistic regression model showed a satisfactory recall score among the three models; however, it was excluded from further consideration because it did not meet the F1 score threshold, which was set at a minimum of 0.5. The F1 score is crucial as it considers both precision and recall, providing a balanced measure of a model's accuracy. Among the models that met the criteria, XGBoost showed the highest recall rate and showed excellent performance in stroke prediction. In particular, XGBoost shows strong performance not only in recall, but also in F1 score and AUC, so it should be considered the optimal algorithm for predicting stroke occurrence. This study determines that the performance of XGBoost is optimal in the field of stroke prediction.

A study of predicting irradiation-induced transition temperature shift for RPV steels with XGBoost modeling

  • Xu, Chaoliang;Liu, Xiangbing;Wang, Hongke;Li, Yuanfei;Jia, Wenqing;Qian, Wangjie;Quan, Qiwei;Zhang, Huajian;Xue, Fei
    • Nuclear Engineering and Technology
    • /
    • v.53 no.8
    • /
    • pp.2610-2615
    • /
    • 2021
  • The prediction of irradiation-induced transition temperature shift for RPV steels is an important method for long term operation of nuclear power plant. Based on the irradiation embrittlement data, an irradiation-induced transition temperature shift prediction model is developed with machine learning method XGBoost. Then the residual, standard deviation and predicted value vs. measured value analysis are conducted to analyze the accuracy of this model. At last, Cu content threshold and saturation values analysis, temperature dependence, Ni/Cu dependence and flux effect are given to verify the reliability. Those results show that the prediction model developed with XGBoost has high accuracy for predicting the irradiation embrittlement trend of RPV steel. The prediction results are consistent with the current understanding of RPV embrittlement mechanism.

A Research on Accuracy Improvement of Diabetes Recognition Factors Based on XGBoost

  • Shin, Yongsub;Yun, Dai Yeol;Moon, Seok-Jae;Hwang, Chi-gon
    • International journal of advanced smart convergence
    • /
    • v.10 no.2
    • /
    • pp.73-78
    • /
    • 2021
  • Recently, the number of people who visit the hospital due to diabetes is increasing. According to the Korean Diabetes Association, it is statistically indicated that one in seven adults aged 30 years or older in Korea suffers from diabetes, and it is expected to be more if the pre-diabetes, fasting blood sugar disorders, are combined. In the last study, the validity of Triglyceride and Cholesterol associated with diabetes was confirmed and analyzed using Random Forest. Random Forest has a disadvantage that as the amount of data increases, it uses more memory and slows down the speed. Therefore, in this paper, we compared and analyzed Random Forest and XGBoost, focusing on improvement of learning speed and prevention of memory waste, which are mainly dealt with in machine learning. Using XGBoost, the problem of slowing down and wasting memory was solved, and the accuracy of the diabetes recognition factor was further increased.

Development of Land Compensation Cost Estimation Model : The Use of the Construction CALS Data and Linked Open Data (토지 보상비 추정 모델 개발 - 건설CALS데이터와 공공데이터 중심으로)

  • Lee, Sang-Gyu;Kim, Jin-Wook;Seo, Myeong-Bae
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.375-378
    • /
    • 2020
  • 본 연구는 토지 보상비의 추정 모델 개발을 위해서 건설 CALS (Continuous Acquisition & Life-cycle Support) 시스템의 내부데이터와 개별공시지가 및 표준지 공시지가 등의 외부데이터, 그리고 개발된 추정 모델의 고도화를 위한 개별공시가 데이터를 기반으로 생성된 데이터를 활용하였다. 이렇게 수집된 3가지 유형의 데이터를 분석하기 위해서 기존 선형 모델 또는 의사결정나무 (Tree) 기반의 모델상 과적합 오류를 제거할 경우 매우 유용한 알고리즘으로 Decision Tree 기반의 Xgboost 알고리즘을 데이터 분석 방법론으로 토지 보상비 추정 모델 개발에 활용하였다. Xgboost 알고리즘의 고도화를 위해 하이퍼파라미터 튜닝을 적용한 결과, 실제 보상비와 개발된 보상비 추정 모델의 MAPE(Mean Absolute Percentage Error) 범위는 19.5%로 확인하였다.

  • PDF

Detecting Errors in Dependency Treebank through XGBoost and Cross Validation (XGBoost와 교차 검증을 이용한 구문분석 말뭉치에서의 오류 탐지)

  • Choi, Min-Seok;Kim, Chang-Hyun;Cheon, Min-Ah;Park, Hyuk-Ro;Kim, Jae-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.103-107
    • /
    • 2020
  • 의존구조 말뭉치는 자연언어처리 분야에서 문장의 의존관계를 파악하는데 널리 사용된다. 이러한 말뭉치는 일반적으로 오류가 없다고 가정하지만, 현실적으로는 다양한 오류를 포함하고 있다. 이러한 오류들은 성능 저하의 요인이 된다. 이러한 문제를 완화하려고 본 논문에서는 XGBoost와 교차검증을 이용하여 이미 구축된 구문분석 말뭉치로부터 오류를 탐지하는 방법을 제안한다. 그러나 오류가 부착된 학습말뭉치가 존재하지 않으므로, 일반적인 분류기로서 오류를 검출할 수 없다. 본 논문에서는 분류기의 결과를 분석하여 오류를 검출하는 방법을 제안한다. 성능을 분석하려고 표본집단과 모집단의 오류 분포의 차이를 분석하였고 표본집단과 모집단의 오류 분포의 차이가 거의 없는 것으로 보아 제안된 방법이 타당함을 알 수 있었다. 앞으로 의미역 부착 말뭉치에 적용할 계획이다.

  • PDF

API Feature Based Ensemble Model for Malware Family Classification (악성코드 패밀리 분류를 위한 API 특징 기반 앙상블 모델 학습)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.3
    • /
    • pp.531-539
    • /
    • 2019
  • This paper proposes the training features for malware family analysis and analyzes the multi-classification performance of ensemble models. We construct training data by extracting API and DLL information from malware executables and use Random Forest and XGBoost algorithms which are based on decision tree. API, API-DLL, and DLL-CM features for malware detection and family classification are proposed by analyzing frequently used API and DLL information from malware and converting high-dimensional features to low-dimensional features. The proposed feature selection method provides the advantages of data dimension reduction and fast learning. In performance comparison, the malware detection rate is 93.0% for Random Forest, the accuracy of malware family dataset is 92.0% for XGBoost, and the false positive rate of malware family dataset including benign is about 3.5% for Random Forest and XGBoost.

Exploring the Factors Influencing Students' Career Maturity in Seoul City Middle School: A Machine Learning (머신러닝을 활용한 서울시 중학생 진로성숙도 예측 요인 탐색)

  • Park, Jung
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.155-170
    • /
    • 2020
  • The purpose of this study was to apply machine learning techniques (Decision Tree, Random Forest, XGBoost) to data from the 4th~6th year of the Seoul Education Longitudinal Study to find the factors predicting the career maturity of middle school students in Seoul city. In order to evaluate the machine learning application result, the performance of the model according to the indicators was checked. In addition, the model was analyzed using the XGBoostExplainer package, and R and R Studio tools were used for this study. As a result, there was a slight difference in the ranking of variable importance by each model, but the rankings were high in 'Achievement goal awareness', 'Creativity', 'Self-concept', 'Relationship with parents and children', and 'Resilience'. In addition, using the XGBoostExplainer package, it was found that the factors that protect and deteriorate career maturity by panel and 'Achievement goal awareness' is the top priority factor for predicting career maturity. Based on the results of this study, it was suggested that a comparative study of machine learning and variable selection methods and a comparative study of each cohort of the Seoul Education Termination Study should be conducted.