• 제목/요약/키워드: Gradient Boosting Algorithm

검색결과 73건 처리시간 0.023초

대사증후군의 인지와 신체활동 실천에 영향을 미치는 요인: 데이터 마이닝 접근 (Factors influencing metabolic syndrome perception and exercising behaviors in Korean adults: Data mining approach)

  • 이수경;문미경
    • 한국산학기술학회논문지
    • /
    • 제18권12호
    • /
    • pp.581-588
    • /
    • 2017
  • 본 연구는 기계 학습법 중 하나인 XGBoost를 이용하여 대사증후군을 인지하고 신체활동을 수행하는 집단을 예측하고자 2014년 7월부터 2015년 12월까지 시도되었다. 이에 2009-2013년 지역사회건강조사를 연구자료로 사용하였고 370,430명의 성인을 분석에 포함하였다. 본 연구의 종속변수는 대사증후군의 인지 및 신체활동 실천정도에 따른 단계로 3단계로 구분하였다:Stage 1(무인지, 무 신체활동), Stage 2(인지, 무 신체활동), and Stage 3(인지, 신체활동). 예측변수로는 5년간의 지역사회건강조사 중 공통으로 수집된 문항으로부터 161개의 특성을 선택하였다. 자료 분석을 위해 R program을 이용하여 XGBoost 알고리즘을 적용하였다. 분석 결과 정확도는 0.735 이었으며, 가장 영향을 미치는 10개의 특성은 나이, 교육수준, 체중조절시도 경험, EQ-5D 운동능력, 영양표시 확인, 개인 건강보험가입 유무, EQ-5D 일상활동, 금연광고경험 여부, 통증유무, 당뇨에 대한 보건기관의 교육 경험 순으로 확인되었다. 본 연구결과는 XGBoost가 보건의료빅데이터를 이용한 질병의 예방과 관리에 영향을 주는 요인을 확인하는데 유용한 도구임을 보여주었다. 또한, 본 연구를 통해 대사증후군에 취약한 계층을 확인하고 이를 위한 교육프로그램 개발에 도움을 줄 수 있을 것으로 보인다.

딥러닝과 앙상블 머신러닝 모형의 하천 탁도 예측 특성 비교 연구 (Comparative characteristic of ensemble machine learning and deep learning models for turbidity prediction in a river)

  • 박정수
    • 상하수도학회지
    • /
    • 제35권1호
    • /
    • pp.83-91
    • /
    • 2021
  • The increased turbidity in rivers during flood events has various effects on water environmental management, including drinking water supply systems. Thus, prediction of turbid water is essential for water environmental management. Recently, various advanced machine learning algorithms have been increasingly used in water environmental management. Ensemble machine learning algorithms such as random forest (RF) and gradient boosting decision tree (GBDT) are some of the most popular machine learning algorithms used for water environmental management, along with deep learning algorithms such as recurrent neural networks. In this study GBDT, an ensemble machine learning algorithm, and gated recurrent unit (GRU), a recurrent neural networks algorithm, are used for model development to predict turbidity in a river. The observation frequencies of input data used for the model were 2, 4, 8, 24, 48, 120 and 168 h. The root-mean-square error-observations standard deviation ratio (RSR) of GRU and GBDT ranges between 0.182~0.766 and 0.400~0.683, respectively. Both models show similar prediction accuracy with RSR of 0.682 for GRU and 0.683 for GBDT. The GRU shows better prediction accuracy when the observation frequency is relatively short (i.e., 2, 4, and 8 h) where GBDT shows better prediction accuracy when the observation frequency is relatively long (i.e. 48, 120, 160 h). The results suggest that the characteristics of input data should be considered to develop an appropriate model to predict turbidity.

머신러닝을 이용한 다공형 GDI 인젝터의 플래시 보일링 분무 예측 모델 개발 (Development of Flash Boiling Spray Prediction Model of Multi-hole GDI Injector Using Machine Learning)

  • 상몽소;신달호;;박수한
    • 한국분무공학회지
    • /
    • 제27권2호
    • /
    • pp.57-65
    • /
    • 2022
  • The purpose of this study is to use machine learning to build a model capable of predicting the flash boiling spray characteristics. In this study, the flash boiling spray was visualized using Shadowgraph visualization technology, and then the spray image was processed with MATLAB to obtain quantitative data of spray characteristics. The experimental conditions were used as input, and the spray characteristics were used as output to train the machine learning model. For the machine learning model, the XGB (extreme gradient boosting) algorithm was used. Finally, the performance of machine learning model was evaluated using R2 and RMSE (root mean square error). In order to have enough data to train the machine learning model, this study used 12 injectors with different design parameters, and set various fuel temperatures and ambient pressures, resulting in about 12,000 data. By comparing the performance of the model with different amounts of training data, it was found that the number of training data must reach at least 7,000 before the model can show optimal performance. The model showed different prediction performances for different spray characteristics. Compared with the upstream spray angle and the downstream spray angle, the model had the best prediction performance for the spray tip penetration. In addition, the prediction performance of the model showed a relatively poor trend in the initial stage of injection and the final stage of injection. The model performance is expired to be further enhanced by optimizing the hyper-parameters input into the model.

스마트폰 과의존 판별을 위한 기계 학습 기법의 응용 (Application of Machine Learning Techniques for Problematic Smartphone Use)

  • 김우성;한준희
    • 아태비즈니스연구
    • /
    • 제13권3호
    • /
    • pp.293-309
    • /
    • 2022
  • Purpose - The purpose of this study is to explore the possibility of predicting the degree of smartphone overdependence based on mobile phone usage patterns. Design/methodology/approach - In this study, a survey conducted by Korea Internet and Security Agency(KISA) called "problematic smartphone use survey" was analyzed. The survey consists of 180 questions, and data were collected from 29,712 participants. Based on the data on the smartphone usage pattern obtained through the questionnaire, the smartphone addiction level was predicted using machine learning techniques. k-NN, gradient boosting, XGBoost, CatBoost, AdaBoost and random forest algorithms were employed. Findings - First, while various factors together influence the smartphone overdependence level, the results show that all machine learning techniques perform well to predict the smartphone overdependence level. Especially, we focus on the features which can be obtained from the smartphone log data (without psychological factors). It means that our results can be a basis for diagnostic programs to detect problematic smartphone use. Second, the results show that information on users' age, marriage and smartphone usage patterns can be used as predictors to determine whether users are addicted to smartphones. Other demographic characteristics such as sex or region did not appear to significantly affect smartphone overdependence levels. Research implications or Originality - While there are some studies that predict smartphone overdependence level using machine learning techniques, but the studies only present algorithm performance based on survey data. In this study, based on the information gain measure, questions that have more influence on the smartphone overdependence level are presented, and the performance of algorithms according to the questions is compared. Through the results of this study, it is shown that smartphone overdependence level can be predicted with less information if questions about smartphone use are given appropriately.

Income prediction of apple and pear farmers in Chungnam area by automatic machine learning with H2O.AI

  • Hyundong, Jang;Sounghun, Kim
    • 농업과학연구
    • /
    • 제49권3호
    • /
    • pp.619-627
    • /
    • 2022
  • In Korea, apples and pears are among the most important agricultural products to farmers who seek to earn money as income. Generally, farmers make decisions at various stages to maximize their income but they do not always know exactly which option will be the best one. Many previous studies were conducted to solve this problem by predicting farmers' income structure, but researchers are still exploring better approaches. Currently, machine learning technology is gaining attention as one of the new approaches for farmers' income prediction. The machine learning technique is a methodology using an algorithm that can learn independently through data. As the level of computer science develops, the performance of machine learning techniques is also improving. The purpose of this study is to predict the income structure of apples and pears using the automatic machine learning solution H2O.AI and to present some implications for apple and pear farmers. The automatic machine learning solution H2O.AI can save time and effort compared to the conventional machine learning techniques such as scikit-learn, because it works automatically to find the best solution. As a result of this research, the following findings are obtained. First, apple farmers should increase their gross income to maximize their income, instead of reducing the cost of growing apples. In particular, apple farmers mainly have to increase production in order to obtain more gross income. As a second-best option, apple farmers should decrease labor and other costs. Second, pear farmers also should increase their gross income to maximize their income but they have to increase the price of pears rather than increasing the production of pears. As a second-best option, pear farmers can decrease labor and other costs.

Personalized Diabetes Risk Assessment Through Multifaceted Analysis (PD- RAMA): A Novel Machine Learning Approach to Early Detection and Management of Type 2 Diabetes

  • Gharbi Alshammari
    • International Journal of Computer Science & Network Security
    • /
    • 제23권8호
    • /
    • pp.17-25
    • /
    • 2023
  • The alarming global prevalence of Type 2 Diabetes Mellitus (T2DM) has catalyzed an urgent need for robust, early diagnostic methodologies. This study unveils a pioneering approach to predicting T2DM, employing the Extreme Gradient Boosting (XGBoost) algorithm, renowned for its predictive accuracy and computational efficiency. The investigation harnesses a meticulously curated dataset of 4303 samples, extracted from a comprehensive Chinese research study, scrupulously aligned with the World Health Organization's indicators and standards. The dataset encapsulates a multifaceted spectrum of clinical, demographic, and lifestyle attributes. Through an intricate process of hyperparameter optimization, the XGBoost model exhibited an unparalleled best score, elucidating a distinctive combination of parameters such as a learning rate of 0.1, max depth of 3, 150 estimators, and specific colsample strategies. The model's validation accuracy of 0.957, coupled with a sensitivity of 0.9898 and specificity of 0.8897, underlines its robustness in classifying T2DM. A detailed analysis of the confusion matrix further substantiated the model's diagnostic prowess, with an F1-score of 0.9308, illustrating its balanced performance in true positive and negative classifications. The precision and recall metrics provided nuanced insights into the model's ability to minimize false predictions, thereby enhancing its clinical applicability. The research findings not only underline the remarkable efficacy of XGBoost in T2DM prediction but also contribute to the burgeoning field of machine learning applications in personalized healthcare. By elucidating a novel paradigm that accentuates the synergistic integration of multifaceted clinical parameters, this study fosters a promising avenue for precise early detection, risk stratification, and patient-centric intervention in diabetes care. The research serves as a beacon, inspiring further exploration and innovation in leveraging advanced analytical techniques for transformative impacts on predictive diagnostics and chronic disease management.

Prediction of Stunting Among Under-5 Children in Rwanda Using Machine Learning Techniques

  • Similien Ndagijimana;Ignace Habimana Kabano;Emmanuel Masabo;Jean Marie Ntaganda
    • Journal of Preventive Medicine and Public Health
    • /
    • 제56권1호
    • /
    • pp.41-49
    • /
    • 2023
  • Objectives: Rwanda reported a stunting rate of 33% in 2020, decreasing from 38% in 2015; however, stunting remains an issue. Globally, child deaths from malnutrition stand at 45%. The best options for the early detection and treatment of stunting should be made a community policy priority, and health services remain an issue. Hence, this research aimed to develop a model for predicting stunting in Rwandan children. Methods: The Rwanda Demographic and Health Survey 2019-2020 was used as secondary data. Stratified 10-fold cross-validation was used, and different machine learning classifiers were trained to predict stunting status. The prediction models were compared using different metrics, and the best model was chosen. Results: The best model was developed with the gradient boosting classifier algorithm, with a training accuracy of 80.49% based on the performance indicators of several models. Based on a confusion matrix, the test accuracy, sensitivity, specificity, and F1 were calculated, yielding the model's ability to classify stunting cases correctly at 79.33%, identify stunted children accurately at 72.51%, and categorize non-stunted children correctly at 94.49%, with an area under the curve of 0.89. The model found that the mother's height, television, the child's age, province, mother's education, birth weight, and childbirth size were the most important predictors of stunting status. Conclusions: Therefore, machine-learning techniques may be used in Rwanda to construct an accurate model that can detect the early stages of stunting and offer the best predictive attributes to help prevent and control stunting in under five Rwandan children.

역직구 상품 추천 및 판매가 추정을 위한 머신러닝 모델 (Machine Learning Model for Recommending Products and Estimating Sales Prices of Reverse Direct Purchase)

  • 김규익;볘르드바에브 예르갈리;김수형;김진석
    • 산업경영시스템학회지
    • /
    • 제46권2호
    • /
    • pp.176-182
    • /
    • 2023
  • With about 80% of the global economy expected to shift to the global market by 2030, exports of reverse direct purchase products, in which foreign consumers purchase products from online shopping malls in Korea, are growing 55% annually. As of 2021, sales of reverse direct purchases in South Korea increased 50.6% from the previous year, surpassing 40 million. In order for domestic SMEs(Small and medium sized enterprises) to enter overseas markets, it is important to come up with export strategies based on various market analysis information, but for domestic small and medium-sized sellers, entry barriers are high, such as lack of information on overseas markets and difficulty in selecting local preferred products and determining competitive sales prices. This study develops an AI-based product recommendation and sales price estimation model to collect and analyze global shopping malls and product trends to provide marketing information that presents promising and appropriate product sales prices to small and medium-sized sellers who have difficulty collecting global market information. The product recommendation model is based on the LTR (Learning To Rank) methodology. As a result of comparing performance with nDCG, the Pair-wise-based XGBoost-LambdaMART Model was measured to be excellent. The sales price estimation model uses a regression algorithm. According to the R-Squared value, the Light Gradient Boosting Machine performs best in this model.

머신러닝을 통한 잉크 필요량 예측 알고리즘 (Machine Learning Algorithm for Estimating Ink Usage)

  • 권세욱;현영주;태현철
    • 산업경영시스템학회지
    • /
    • 제46권1호
    • /
    • pp.23-31
    • /
    • 2023
  • Research and interest in sustainable printing are increasing in the packaging printing industry. Currently, predicting the amount of ink required for each work is based on the experience and intuition of field workers. Suppose the amount of ink produced is more than necessary. In this case, the rest of the ink cannot be reused and is discarded, adversely affecting the company's productivity and environment. Nowadays, machine learning models can be used to figure out this problem. This study compares the ink usage prediction machine learning models. A simple linear regression model, Multiple Regression Analysis, cannot reflect the nonlinear relationship between the variables required for packaging printing, so there is a limit to accurately predicting the amount of ink needed. This study has established various prediction models which are based on CART (Classification and Regression Tree), such as Decision Tree, Random Forest, Gradient Boosting Machine, and XGBoost. The accuracy of the models is determined by the K-fold cross-validation. Error metrics such as root mean squared error, mean absolute error, and R-squared are employed to evaluate estimation models' correctness. Among these models, XGBoost model has the highest prediction accuracy and can reduce 2134 (g) of wasted ink for each work. Thus, this study motivates machine learning's potential to help advance productivity and protect the environment.

GOCI-II 대기상한 반사도와 기계학습을 이용한 남한 지역 시간별 에어로졸 광학 두께 산출 (Retrieval of Hourly Aerosol Optical Depth Using Top-of-Atmosphere Reflectance from GOCI-II and Machine Learning over South Korea)

  • 양세영;최현영;임정호
    • 대한원격탐사학회지
    • /
    • 제39권5_3호
    • /
    • pp.933-948
    • /
    • 2023
  • 대기 중 에어로졸은 인체에 악영향을 끼칠 뿐 아니라 기후 시스템에도 직간접적인 영향을 미치므로 에어로졸의 특성과 시공간적인 분포에 대한 이해는 매우 중요하다. 이를 위해 위성기반 관측을 통해 에어로졸 광학 두께(Aerosol Optical Depth, AOD)를 산출하여 에어로졸을 모니터링하는 다양한 연구가 수행되어 왔다. 하지만 이는 주로 조견표를 활용한 역 산출 알고리즘에 기반하여 이루어지기 때문에 많은 계산량을 요구하며 불확실성이 존재한다. 따라서, 본 연구에서는 Geostationary Ocean Color Imager-II (GOCI-II)의 대기상한반사도와 30일 동안의 대기상한반사도 중 최솟값과 관측 시점 값의 차이 값, 수치 모델 기반 기상학적 변수 등을 활용하여 기계학습 기반 고해상도 AOD 직접 산출 알고리즘을 개발하였다. Light Gradient Boosting Machine (LGBM) 기법이 사용되었으며, 추정된 결과는 지상 관측 자료인 Aerosol Robotic Network (AERONET) AOD를 활용하여 랜덤, 시간 및 공간별 N-fold 교차검증을 통해 검증되었다. 세 가지 교차검증 결과 R2=0.70-0.80, RMSE=0.08-0.09, 기대오차(Expected Error, EE) 안에 있는 비율은 75.2-85.1% 수준으로 안정적인 성능을 보였다. Shapley Additive exPlanations (SHAP) 분석에서는 반사도 관련 변수들이 기여도의 상위권 대부분을 차지하고 있는 것을 통해 반사도 자료가 AOD 추정에 많은 기여를 하는 것을 확인하였다. 서울과 울산 지역에 대한 시간 별 AOD의 공간 분포를 분석한 결과, 개발된 LGBM 모델은 시간의 흐름에 따라 AERONET AOD 값과 유사한 수준으로 AOD를 추정하고 있었다. 이를 통해 높은 시공간 해상도(i.e., 시간별, 250 m)에서의 AOD 산출이 가능함을 확인하였다. 또한, 산출 커버리지 비교에서 LGBM 모델의 평균 산출 빈도가 GOCI-II L2 AOD 산출물 대비 8.8%가량 증가한 것을 통해 기존 물리모델기반 AOD 산출 과정에서 발생하던 밝은 지표면에 대한 과도한 마스킹의 문제점을 개선시킨 것을 확인하였다.