• Title/Summary/Keyword: XGboost

Search Result 238, Processing Time 0.025 seconds

Development of Big Data-based Cardiovascular Disease Prediction Analysis Algorithm

  • Kyung-A KIM;Dong-Hun HAN;Myung-Ae CHUNG
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.3
    • /
    • pp.29-34
    • /
    • 2023
  • Recently, the rapid development of artificial intelligence technology, many studies are being conducted to predict the risk of heart disease in order to lower the mortality rate of cardiovascular diseases worldwide. This study presents exercise or dietary improvement contents in the form of a software app or web to patients with cardiovascular disease, and cardiovascular disease through digital devices such as mobile phones and PCs. LR, LDA, SVM, XGBoost for the purpose of developing "Life style Improvement Contents (Digital Therapy)" for cardiovascular disease care to help with management or treatment We compared and analyzed cardiovascular disease prediction models using machine learning algorithms. Research Results XGBoost. The algorithm model showed the best predictive model performance with overall accuracy of 80% before and after. Overall, accuracy was 80.0%, F1 Score was 0.77~0.79, and ROC-AUC was 80%~84%, resulting in predictive model performance. Therefore, it was found that the algorithm used in this study can be used as a reference model necessary to verify the validity and accuracy of cardiovascular disease prediction. A cardiovascular disease prediction analysis algorithm that can enter accurate biometric data collected in future clinical trials, add lifestyle management (exercise, eating habits, etc.) elements, and verify the effect and efficacy on cardiovascular-related bio-signals and disease risk. development, ultimately suggesting that it is possible to develop lifestyle improvement contents (Digital Therapy).

XGBoost Based Prediction Model for Virtual Metrology in Semiconductor Manufacturing Process (반도체 공정에서 가상계측 위한 XGBoost 기반 예측모델)

  • Hahn, Jung-Suk;Kim, Hyunggeun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.477-480
    • /
    • 2022
  • 반도체 성능 향상으로 신호를 전달하는 회로의 단위가 마이크로 미터에서 나노미터로 미세화되어 선폭(linewidth)이 점점 좁아지고 있다. 이러한 변화는 검출해야 할 불량의 크기가 작아지고, 정상 공정상태와 비정상 공정상태의 차이도 상대적으로 감소되어, 공정오차 및 공정조건의 허용범위가 축소되었음을 의미한다. 따라서 검출해야 할 이상징후 탐지가 더욱 어렵게 되어, 높은 정밀도와 해상도를 갖는 검사공정이 요구되고 있다. 이러한 이유로, 미세 공정변화를 파악할 수 있는 신규 검사 및 계측 공정이 추가되어 TAT(Turn-around Time)가 증가하게 되었고, 웨이퍼가 가공되어 완제품까지 도달하는데 필요한 공정시간이 증가하여 제조원가 상승의 원인으로 작용한다. 본 논문에서는 웨이퍼의 검계측 데이터가 아닌, 제조공정 과정에서 발생하는 다양한 센서 및 장비 데이터를 기반으로 웨이퍼 제조 결과가 양품인지 그렇지 않으면 불량인지 구별할 수 있는 가상계측 모델을 제안한다. 기계학습의 여러 알고리즘 중에서 다양한 장점을 갖는 XGBoost 알고리즘을 이용하여 예측모델을 구축하였고, 데이터 전처리(data-preprocessing), 주요변수 추출(feature selection), 모델 구축(model design), 모델 평가(model evaluation)의 순서로 연구를 수행하였다. 결과적으로 약 94% 이상의 정확성을 갖는 모형을 구축하는데 성공하였으나 더욱 높은 정확성을 확보하기 위해서는 반도체 공정과 관련된 Domain Knowledge 를 반영한 모델구축과 같은 추가적인 연구가 필요하다.

Prediction of Larix kaempferi Stand Growth in Gangwon, Korea, Using Machine Learning Algorithms

  • Hyo-Bin Ji;Jin-Woo Park;Jung-Kee Choi
    • Journal of Forest and Environmental Science
    • /
    • v.39 no.4
    • /
    • pp.195-202
    • /
    • 2023
  • In this study, we sought to compare and evaluate the accuracy and predictive performance of machine learning algorithms for estimating the growth of individual Larix kaempferi trees in Gangwon Province, Korea. We employed linear regression, random forest, XGBoost, and LightGBM algorithms to predict tree growth using monitoring data organized based on different thinning intensities. Furthermore, we compared and evaluated the goodness-of-fit of these models using metrics such as the coefficient of determination (R2), mean absolute error (MAE), and root mean square error (RMSE). The results revealed that XGBoost provided the highest goodness-of-fit, with an R2 value of 0.62 across all thinning intensities, while also yielding the lowest values for MAE and RMSE, thereby indicating the best model fit. When predicting the growth volume of individual trees after 3 years using the XGBoost model, the agreement was exceptionally high, reaching approximately 97% for all stand sites in accordance with the different thinning intensities. Notably, in non-thinned plots, the predicted volumes were approximately 2.1 m3 lower than the actual volumes; however, the agreement remained highly accurate at approximately 99.5%. These findings will contribute to the development of growth prediction models for individual trees using machine learning algorithms.

Forest Vertical Structure Mapping from Bi-Seasonal Sentinel-2 Images and UAV-Derived DSM Using Random Forest, Support Vector Machine, and XGBoost

  • Young-Woong Yoon;Hyung-Sup Jung
    • Korean Journal of Remote Sensing
    • /
    • v.40 no.2
    • /
    • pp.123-139
    • /
    • 2024
  • Forest vertical structure is vital for comprehending ecosystems and biodiversity, in addition to fundamental forest information. Currently, the forest vertical structure is predominantly assessed via an in-situ method, which is not only difficult to apply to inaccessible locations or large areas but also costly and requires substantial human resources. Therefore, mapping systems based on remote sensing data have been actively explored. Recently, research on analyzing and classifying images using machine learning techniques has been actively conducted and applied to map the vertical structure of forests accurately. In this study, Sentinel-2 and digital surface model images were obtained on two different dates separated by approximately one month, and the spectral index and tree height maps were generated separately. Furthermore, according to the acquisition time, the input data were separated into cases 1 and 2, which were then combined to generate case 3. Using these data, forest vetical structure mapping models based on random forest, support vector machine, and extreme gradient boost(XGBoost)were generated. Consequently, nine models were generated, with the XGBoost model in Case 3 performing the best, with an average precision of 0.99 and an F1 score of 0.91. We confirmed that generating a forest vertical structure mapping model utilizing bi-seasonal data and an appropriate model can result in an accuracy of 90% or higher.

Limiting conditions prediction using machine learning for loss of condenser vacuum event

  • Dong-Hun Shin;Moon-Ghu Park;Hae-Yong Jeong;Jae-Yong Lee;Jung-Uk Sohn;Do-Yeon Kim
    • Nuclear Engineering and Technology
    • /
    • v.55 no.12
    • /
    • pp.4607-4616
    • /
    • 2023
  • We implement machine learning regression models to predict peak pressures of primary and secondary systems, a major safety concern in Loss Of Condenser Vacuum (LOCV) accident. We selected the Multi-dimensional Analysis of Reactor Safety-KINS standard (MARS-KS) code to analyze the LOCV accident, and the reference plant is the Korean Optimized Power Reactor 1000MWe (OPR1000). eXtreme Gradient Boosting (XGBoost) is selected as a machine learning tool. The MARS-KS code is used to generate LOCV accident data and the data is applied to train the machine learning model. Hyperparameter optimization is performed using a simulated annealing. The randomly generated combination of initial conditions within the operating range is put into the input of the XGBoost model to predict the peak pressure. These initial conditions that cause peak pressure with MARS-KS generate the results. After such a process, the error between the predicted value and the code output is calculated. Uncertainty about the machine learning model is also calculated to verify the model accuracy. The machine learning model presented in this paper successfully identifies a combination of initial conditions that produce a more conservative peak pressure than the values calculated with existing methodologies.

Machine Learning Based Model Development and Optimization for Predicting Radiation (방사선량률 예측을 위한 기계학습 기반 모델 개발 및 최적화 연구)

  • SiHyun Lee;HongYeon Lee;JungMin Yeom
    • Journal of Radiation Industry
    • /
    • v.17 no.4
    • /
    • pp.551-557
    • /
    • 2023
  • In recent years, radiation has become a socially important issue, increasing the need for accurate prediction of radiation levels. In this study, machine learning-based models such as Multiple Linear Regression (MLR), Random Forest (RF), XGBoost, and LightGBM, which predict the dose rate by time(nSv h-1) by selecting only important variables, were used, and the correlation between temperature, humidity, cumulative precipitation, wind direction, wind speed, local air pressure, sea pressure, solar radiation, and radiation dose rate (nSv h-1) was analyzed by collecting weather data and radiation dose rate for about 6 months in Jangseong, Jeollanam-do. As a result of the evaluation based on the RMSE (Root Mean Squared Error) and R-Squared (R-Squared coefficient of determination) scores, the RMSE of the XGBoost model was 22.92 and the R-Squared was 0.73, showing the best performance among the models used. As a result of optimizing hyperparameters of all models using the GridSearch method and comparing them by adding variables inside the measuring instrument, it was confirmed that the performance improved to 2.39 for RMSE and 0.99 for R-Squared in both XGBoost and LightGBM.

Wind power forecasting based on time series and machine learning models (시계열 모형과 기계학습 모형을 이용한 풍력 발전량 예측 연구)

  • Park, Sujin;Lee, Jin-Young;Kim, Sahm
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.723-734
    • /
    • 2021
  • Wind energy is one of the rapidly developing renewable energies which is being developed and invested in response to climate change. As renewable energy policies and power plant installations are promoted, the supply of wind power in Korea is gradually expanding and attempts to accurately predict demand are expanding. In this paper, the ARIMA and ARIMAX models which are Time series techniques and the SVR, Random Forest and XGBoost models which are machine learning models were compared and analyzed to predict wind power generation in the Jeonnam and Gyeongbuk regions. Mean absolute error (MAE) and mean absolute percentage error (MAPE) were used as indicators to compare the predicted results of the model. After subtracting the hourly raw data from January 1, 2018 to October 24, 2020, the model was trained to predict wind power generation for 168 hours from October 25, 2020 to October 31, 2020. As a result of comparing the predictive power of the models, the Random Forest and XGBoost models showed the best performance in the order of Jeonnam and Gyeongbuk. In future research, we will try not only machine learning models but also forecasting wind power generation based on data mining techniques that have been actively researched recently.

A Study on the Prediction of Strawberry Production in Machine Learning Infrastructure (머신러닝 기반 시설재배 딸기 생산량 예측 연구)

  • Oh, HanByeol;Lim, JongHyun;Yang, SeungWeon;Cho, YongYun;Shin, ChangSun
    • Smart Media Journal
    • /
    • v.11 no.5
    • /
    • pp.9-16
    • /
    • 2022
  • Recently, agricultural sites are automating into digital agricultural smart farms by applying technologies such as big data and Internet of Things (IoT). These smart farms aim to increase production and improve crop quality by measuring the environment of crops, investigating and processing data. Production prediction is an important study in smart farm digital agriculture, which is a high-tech agriculture, and it is necessary to analyze environmental data using big data and further standardized research to manage the quality of growth information data. In this paper, environmental and production data collected from smart farm strawberry farms were analyzed and studied. Based on regression analysis, crop production prediction models were analyzed using Ridge Regression, LightGBM, and XGBoost. Among the three models, the optimal model was XGBoost, and R2 showed 82.5 percent explanatory power. As a result of the study, the correlation between the amount of positive fluid absorption and environmental data was confirmed, and significant results were obtained for the production prediction study. In the future, it is expected to contribute to the prevention of environmental pollution and reduction of sheep through the management of sheep by studying the amount of sheep absorption, such as information on the growing environment of crops and the ingredients of sheep.

Prediction of Germination of Korean Red Pine (Pinus densiflora) Seed using FT NIR Spectroscopy and Binary Classification Machine Learning Methods (FT NIR 분광법 및 이진분류 머신러닝 방법을 이용한 소나무 종자 발아 예측)

  • Yong-Yul Kim;Ja-Jung Ku;Da-Eun Gu;Sim-Hee Han;Kyu-Suk Kang
    • Journal of Korean Society of Forest Science
    • /
    • v.112 no.2
    • /
    • pp.145-156
    • /
    • 2023
  • In this study, Fourier-transform near-infrared (FT-NIR) spectra of Korean red pine seeds stored at -18℃ and 4℃ for 18 years were analyzed. To develop seed-germination prediction models, the performance of seven machine learning methods, namely XGBoost, Boosted Tree, Bootstrap Forest, Neural Networks, Decision Tree, Support Vector Machine, PLS-DA, were compared. The predictive performance, assessed by accuracy, misclassification, and area under the curve (0.9722, 0.0278, and 0.9735 for XGBoost, and 0.9653, 0.0347, and 0.9647 for Boosted Tree), was better for the XGBoost and decision tree models when compared with other models. The 54 wave-number variables of the two models were of high relative importance in seed-germination prediction and were grouped into six spectral ranges (811~1,088 nm, 1,137~1,273 nm, 1,336~1,453 nm, 1,666~1,671 nm, 1,879~2,045 nm, and 2,058~2,409 nm) for aromatic amino acids, cellulose, lignin, starch, fatty acids, and moisture, respectively. Use of the NIR spectral data and two machine learning models developed in this study gave >96% accuracy for the prediction of pine-seed germination after long-term storage, indicating this approach could be useful for non-destructive viability testing of stored seed genetic resources.

Condition Estimation of Facility Elements Using XGBoost (XGBoost를 활용한 시설물의 부재 상태 예측)

  • Chang, Taeyeon;Yoon, Sihoo;Chi, Seokho;Im, Seokbeen
    • Korean Journal of Construction Engineering and Management
    • /
    • v.24 no.1
    • /
    • pp.31-39
    • /
    • 2023
  • To reduce facility management costs and safety concerns due to aging of facilities, it is important to estimate the future facilities' condition based on facility management data and utilize predictive information for management decision making. To this end, this study proposed a methodology to estimate facility elements' condition using XGBoost. To validate the proposed methodology, this study constructed sample data for road bridges and developed a model to estimate condition grades of major elements expected in the next inspection. As a result, the developed model showed satisfactory performance in estimating the condition grades of deck, girder, and abutment/pier (average F1 score 0.869). In addition, a testbed was established that provides data management function and element condition estimation function to demonstrate the practical applicability of the proposed methodology. It was confirmed that the facility management data and predictive information in this study could help managers in making facility management decisions.