• 제목/요약/키워드: gradient boosting

검색결과 221건 처리시간 0.028초

스마트폰 센서와 기계학습을 이용한 실내외 운동 활동의 인식 (Recognition of Indoor and Outdoor Exercising Activities using Smartphone Sensors and Machine Learning)

  • 김재경;주연호
    • 창의정보문화연구
    • /
    • 제7권4호
    • /
    • pp.235-242
    • /
    • 2021
  • 스마트폰은 다양한 고성능의 센서가 포함되어 있으며 센서에서 발생하는 데이터를 이용하여 인간의 활동을 분석하는 연구가 진행되어왔다. 이러한 인간 활동 인식은 생활 패턴 분석, 운동량 측정, 위험 상황 감지 등 다양한 분야에서 활용될 수 있다. 그러나 기존 연구의 경우 인간의 기본 행동의 인식에 초점을 두거나 효율적인 배터리 사용을 위해 최적의 인식 결과를 내는 방법을 연구하는 경우가 많았다. 본 논문에서는 기본 행동에 건강 관리 목적으로 실내 및 실외에서 행해지는 운동 동작을 총 10가지로 정의하여 인식하도록 하였다. 이를 위해 가속도, 자이로 및 위치 센서의 값을 수집하고 데이터 전처리 과정을 거치고, 활동을 인식하기 위해서 SVM 모델 외에 안정적인 성능을 가진 앙상블 기반의 랜덤 포레스트, 그라디언트 부스팅 모델을 결합하여 투표 기반으로 인식 결과를 결정하였다. 그 결과 높은 정확도로 정의된 활동의 인식이 가능하였으며 특히 유사한 종류의 실내 및 실외 운동 활동의 분류가 가능하였다.

스마트폰 과의존 판별을 위한 기계 학습 기법의 응용 (Application of Machine Learning Techniques for Problematic Smartphone Use)

  • 김우성;한준희
    • 아태비즈니스연구
    • /
    • 제13권3호
    • /
    • pp.293-309
    • /
    • 2022
  • Purpose - The purpose of this study is to explore the possibility of predicting the degree of smartphone overdependence based on mobile phone usage patterns. Design/methodology/approach - In this study, a survey conducted by Korea Internet and Security Agency(KISA) called "problematic smartphone use survey" was analyzed. The survey consists of 180 questions, and data were collected from 29,712 participants. Based on the data on the smartphone usage pattern obtained through the questionnaire, the smartphone addiction level was predicted using machine learning techniques. k-NN, gradient boosting, XGBoost, CatBoost, AdaBoost and random forest algorithms were employed. Findings - First, while various factors together influence the smartphone overdependence level, the results show that all machine learning techniques perform well to predict the smartphone overdependence level. Especially, we focus on the features which can be obtained from the smartphone log data (without psychological factors). It means that our results can be a basis for diagnostic programs to detect problematic smartphone use. Second, the results show that information on users' age, marriage and smartphone usage patterns can be used as predictors to determine whether users are addicted to smartphones. Other demographic characteristics such as sex or region did not appear to significantly affect smartphone overdependence levels. Research implications or Originality - While there are some studies that predict smartphone overdependence level using machine learning techniques, but the studies only present algorithm performance based on survey data. In this study, based on the information gain measure, questions that have more influence on the smartphone overdependence level are presented, and the performance of algorithms according to the questions is compared. Through the results of this study, it is shown that smartphone overdependence level can be predicted with less information if questions about smartphone use are given appropriately.

Cross-Technology Localization: Leveraging Commodity WiFi to Localize Non-WiFi Device

  • Zhang, Dian;Zhang, Rujun;Guo, Haizhou;Xiang, Peng;Guo, Xiaonan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권11호
    • /
    • pp.3950-3969
    • /
    • 2021
  • Radio Frequency (RF)-based indoor localization technologies play significant roles in various Internet of Things (IoT) services (e.g., location-based service). Most such technologies require that all the devices comply with a specified technology (e.g., WiFi, ZigBee, and Bluetooth). However, this requirement limits its application scenarios in today's IoT context where multiple devices complied with different standards coexist in a shared environment. To bridge the gap, in this paper, we propose a cross-technology localization approach, which is able to localize target nodes using a different type of devices. Specifically, the proposed framework reuses the existing WiFi infrastructure without introducing additional cost to localize Non-WiFi device (i.e., ZigBee). The key idea is to leverage the interference between devices that share the same operating frequency (e.g., 2.4GHz). Such interference exhibits unique patterns that depend on the target device's location, thus it can be leveraged for cross-technology localization. The proposed framework uses Principal Components Analysis (PCA) to extract salient features of the received WiFi signals, and leverages Dynamic Time Warping (DTW), Gradient Boosting Regression Tree (GBRT) to improve the robustness of our system. We conduct experiments in real scenario and investigate the impact of different factors. Experimental results show that the average localization accuracy of our prototype can reach 1.54m, which demonstrates a promising direction of building cross-technology technologies to fulfill the needs of modern IoT context.

Income prediction of apple and pear farmers in Chungnam area by automatic machine learning with H2O.AI

  • Hyundong, Jang;Sounghun, Kim
    • 농업과학연구
    • /
    • 제49권3호
    • /
    • pp.619-627
    • /
    • 2022
  • In Korea, apples and pears are among the most important agricultural products to farmers who seek to earn money as income. Generally, farmers make decisions at various stages to maximize their income but they do not always know exactly which option will be the best one. Many previous studies were conducted to solve this problem by predicting farmers' income structure, but researchers are still exploring better approaches. Currently, machine learning technology is gaining attention as one of the new approaches for farmers' income prediction. The machine learning technique is a methodology using an algorithm that can learn independently through data. As the level of computer science develops, the performance of machine learning techniques is also improving. The purpose of this study is to predict the income structure of apples and pears using the automatic machine learning solution H2O.AI and to present some implications for apple and pear farmers. The automatic machine learning solution H2O.AI can save time and effort compared to the conventional machine learning techniques such as scikit-learn, because it works automatically to find the best solution. As a result of this research, the following findings are obtained. First, apple farmers should increase their gross income to maximize their income, instead of reducing the cost of growing apples. In particular, apple farmers mainly have to increase production in order to obtain more gross income. As a second-best option, apple farmers should decrease labor and other costs. Second, pear farmers also should increase their gross income to maximize their income but they have to increase the price of pears rather than increasing the production of pears. As a second-best option, pear farmers can decrease labor and other costs.

인공지능 기반 빈집 추정 및 주요 특성 분석 (Vacant House Prediction and Important Features Exploration through Artificial Intelligence: In Case of Gunsan)

  • 임규건;노종화;이현태;안재익
    • 한국IT서비스학회지
    • /
    • 제21권3호
    • /
    • pp.63-72
    • /
    • 2022
  • The extinction crisis of local cities, caused by a population density increase phenomenon in capital regions, directly causes the increase of vacant houses in local cities. According to population and housing census, Gunsan-si has continuously shown increasing trend of vacant houses during 2015 to 2019. In particular, since Gunsan-si is the city which suffers from doughnut effect and industrial decline, problems regrading to vacant house seems to exacerbate. This study aims to provide a foundation of a system which can predict and deal with the building that has high risk of becoming vacant house through implementing a data driven vacant house prediction machine learning model. Methodologically, this study analyzes three types of machine learning model by differing the data components. First model is trained based on building register, individual declared land value, house price and socioeconomic data and second model is trained with the same data as first model but with additional POI(Point of Interest) data. Finally, third model is trained with same data as the second model but with excluding water usage and electricity usage data. As a result, second model shows the best performance based on F1-score. Random Forest, Gradient Boosting Machine, XGBoost and LightGBM which are tree ensemble series, show the best performance as a whole. Additionally, the complexity of the model can be reduced through eliminating independent variables that have correlation coefficient between the variables and vacant house status lower than the 0.1 based on absolute value. Finally, this study suggests XGBoost and LightGBM based machine learning model, which can handle missing values, as final vacant house prediction model.

댐 일유입량 예측을 위한 데이터 전처리와 머신러닝&딥러닝 모델 조합의 비교연구 (Comparative Study of Data Preprocessing and ML&DL Model Combination for Daily Dam Inflow Prediction)

  • 조영식;정관수
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2023년도 학술발표회
    • /
    • pp.358-358
    • /
    • 2023
  • 본 연구에서는 그동안 수자원분야 강우유출 해석분야에 활용되었던 대표적인 머신러닝&딥러닝(ML&DL) 모델을 활용하여 모델의 하이퍼파라미터 튜닝뿐만 아니라 모델의 특성을 고려한 기상 및 수문데이터의 조합과 전처리(lag-time, 이동평균 등)를 통하여 데이터 특성과 ML&DL모델의 조합시나리오에 따른 일 유입량 예측성능을 비교 검토하는 연구를 수행하였다. 이를 위해 소양강댐 유역을 대상으로 1974년에서 2021년까지 축적된 기상 및 수문데이터를 활용하여 1) 강우, 2) 유입량, 3) 기상자료를 주요 영향변수(독립변수)로 고려하고, 이에 a) 지체시간(lag-time), b) 이동평균, c) 유입량의 성분분리조건을 적용하여 총 36가지 시나리오 조합을 ML&DL의 입력자료로 활용하였다. ML&DL 모델은 1) Linear Regression(LR), 2) Lasso, 3) Ridge, 4) SVR(Support Vector Regression), 5) Random Forest(RF), 6) LGBM(Light Gradient Boosting Model), 7) XGBoost의 7가지 ML방법과 8) LSTM(Long Short-Term Memory models), 9) TCN(Temporal Convolutional Network), 10) LSTM-TCN의 3가지 DL 방법, 총 10가지 ML&DL모델을 비교 검토하여 일유입량 예측을 위한 가장 적합한 데이터 조합 특성과 ML&DL모델을 성능평가와 함께 제시하였다. 학습된 모형의 유입량 예측 결과를 비교·분석한 결과, 소양강댐 유역에서는 딥러닝 중에서는 TCN모형이 가장 우수한 성능을 보였고(TCN>TCN-LSTM>LSTM), 트리기반 머신러닝중에서는 Random Forest와 LGBM이 우수한 성능을 보였으며(RF, LGBM>XGB), SVR도 LGBM수준의 우수한 성능을 나타내었다. LR, Lasso, Ridge 세가지 Regression모형은 상대적으로 낮은 성능을 보였다. 또한 소양강댐 댐유입량 예측에 대하여 강우, 유입량, 기상계열을 36가지로 조합한 결과, 입력자료에 lag-time이 적용된 강우계열의 조합 분석에서 세가지 Regression모델을 제외한 모든 모형에서 NSE(Nash-Sutcliffe Efficiency) 0.8이상(최대 0.867)의 성능을 보였으며, lag-time이 적용된 강우와 유입량계열을 조합했을 경우 NSE 0.85이상(최대 0.901)의 더 우수한 성능을 보였다.

  • PDF

AI기반 물공급 시스템내 동파위험 조기경보를 위한 AI모델 개발 연구 (Development of an AI-based Early Warning System for Water Meter Freeze-Burst Detection Using AI Models)

  • 이소령;장현준;이진욱;김성훈
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2023년도 학술발표회
    • /
    • pp.511-511
    • /
    • 2023
  • 기후변화로 동절기 기온 저하에 따른 수도계량기의 동파는 지속적으로 심화되고 있으며, 이는 계량기 교체 비용, 누수, 누수량 동결에 의한 2차 피해, 단수 등 사회적 문제를 야기한다. 이와같은 문제를 해결하고자 구조적 대책으로 개별 가정에서 동파 방지형 계량기를 설치할 수 있으나 이를 위한 비용발생이 상당하고, 비구조적 대책으로는 기상청의 동파 지도 알림 서비스를 활용하여 사전적으로 대응하고자 하나, 기상청자료는 대기 온도를 중심으로 제공하고 있기 때문에 해당서비스만으로는 계량기의 동파를 예측하는데 필요한 추가적인 다양한 변수를 활용하는데 한계가 있다. 최근 정부와 공공부문에서 22개 지역, 110개소 이상의 수도계량기함내 IoT 온도센서를 시범 설치하여 계량기 함내의 상태 등을 확인할 수 있는 사업을 수행했다. 전국적인 계량기 상태의 예측과 진단을 위해서는 추가적인 센서 설치가 필요할 것이나, IoT센서 설치 비용 등의 문제로 추가 설치가 더딘 실정이다. 본 연구에서는 겨울 동파 예방을 위해 실제 온도센서를 기반으로 가상센서를 구축하고, 이를 혼합한 하이브리드 방식으로 동파위험 기준에 따라 전국 동파위험 지도를 구축하였다. 가상센서 개발을 위해 독립변수로 위경도, 고도, 음·양지, 보온재 여부 및 기상정보(기온, 강수량, 풍속, 습도)를 활용하고, 종속변수로 실제 센서의 온도를 사용하여 기계학습 모델을 개발하였다. 지역 특성에 따라 정확한 모델을 구축하기 위해 위치정보 및 보온재여부 등의 변수를 활용하여 K-means 방법으로 군집화 하였으며, 각 군집별로 3가지의 기계학습 회귀모델을 적용하였다. 최적의 군집 수를 검토한 결과 4개가 적정한 것으로 판단되었다. 군집의 특성은 지역별 구분과 유사한 패턴을 보이며, 모든 군집에서 Gradient Boosting 회귀모델을 적용하는 것이 적합한 것으로 나타났다. 본 연구에서 개발한 모델을 바탕으로 조건에 따라 동파 예측 알람서비스에 실무적으로 활용할 수 있도록 양호·주의·위험·매우위험 총 4개의 기준을 설정하였다. 실제 본 연구에서 개발된 알고리즘을 국가상수도정보 시스템에 반영하여 테스트 수행중에 있으며, 향후 지속 검증을 할 예정에 있다. 이를 통해 동파 예방 및 피해 최소화, 물절약 등 직간접적 편익이 기대된다.

  • PDF

Water level forecasting for extended lead times using preprocessed data with variational mode decomposition: A case study in Bangladesh

  • Shabbir Ahmed Osmani;Roya Narimani;Hoyoung Cha;Changhyun Jun;Md Asaduzzaman Sayef
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2023년도 학술발표회
    • /
    • pp.179-179
    • /
    • 2023
  • This study suggests a new approach of water level forecasting for extended lead times using original data preprocessing with variational mode decomposition (VMD). Here, two machine learning algorithms including light gradient boosting machine (LGBM) and random forest (RF) were considered to incorporate extended lead times (i.e., 5, 10, 15, 20, 25, 30, 40, and 50 days) forecasting of water levels. At first, the original data at two water level stations (i.e., SW173 and SW269 in Bangladesh) and their decomposed data from VMD were prepared on antecedent lag times to analyze in the datasets of different lead times. Mean absolute error (MAE), root mean squared error (RMSE), and mean squared error (MSE) were used to evaluate the performance of the machine learning models in water level forecasting. As results, it represents that the errors were minimized when the decomposed datasets were considered to predict water levels, rather than the use of original data standalone. It was also noted that LGBM produced lower MAE, RMSE, and MSE values than RF, indicating better performance. For instance, at the SW173 station, LGBM outperformed RF in both decomposed and original data with MAE values of 0.511 and 1.566, compared to RF's MAE values of 0.719 and 1.644, respectively, in a 30-day lead time. The models' performance decreased with increasing lead time, as per the study findings. In summary, preprocessing original data and utilizing machine learning models with decomposed techniques have shown promising results for water level forecasting in higher lead times. It is expected that the approach of this study can assist water management authorities in taking precautionary measures based on forecasted water levels, which is crucial for sustainable water resource utilization.

  • PDF

Personalized Diabetes Risk Assessment Through Multifaceted Analysis (PD- RAMA): A Novel Machine Learning Approach to Early Detection and Management of Type 2 Diabetes

  • Gharbi Alshammari
    • International Journal of Computer Science & Network Security
    • /
    • 제23권8호
    • /
    • pp.17-25
    • /
    • 2023
  • The alarming global prevalence of Type 2 Diabetes Mellitus (T2DM) has catalyzed an urgent need for robust, early diagnostic methodologies. This study unveils a pioneering approach to predicting T2DM, employing the Extreme Gradient Boosting (XGBoost) algorithm, renowned for its predictive accuracy and computational efficiency. The investigation harnesses a meticulously curated dataset of 4303 samples, extracted from a comprehensive Chinese research study, scrupulously aligned with the World Health Organization's indicators and standards. The dataset encapsulates a multifaceted spectrum of clinical, demographic, and lifestyle attributes. Through an intricate process of hyperparameter optimization, the XGBoost model exhibited an unparalleled best score, elucidating a distinctive combination of parameters such as a learning rate of 0.1, max depth of 3, 150 estimators, and specific colsample strategies. The model's validation accuracy of 0.957, coupled with a sensitivity of 0.9898 and specificity of 0.8897, underlines its robustness in classifying T2DM. A detailed analysis of the confusion matrix further substantiated the model's diagnostic prowess, with an F1-score of 0.9308, illustrating its balanced performance in true positive and negative classifications. The precision and recall metrics provided nuanced insights into the model's ability to minimize false predictions, thereby enhancing its clinical applicability. The research findings not only underline the remarkable efficacy of XGBoost in T2DM prediction but also contribute to the burgeoning field of machine learning applications in personalized healthcare. By elucidating a novel paradigm that accentuates the synergistic integration of multifaceted clinical parameters, this study fosters a promising avenue for precise early detection, risk stratification, and patient-centric intervention in diabetes care. The research serves as a beacon, inspiring further exploration and innovation in leveraging advanced analytical techniques for transformative impacts on predictive diagnostics and chronic disease management.

Does the quality of orthodontic studies influence their Altmetric Attention Score?

  • Thamer Alsaif;Nikolaos Pandis;Martyn T. Cobourne;Jadbinder Seehra
    • 대한치과교정학회지
    • /
    • 제53권5호
    • /
    • pp.328-335
    • /
    • 2023
  • Objective: The aim of this study was to determine whether an association between study quality, other study characteristics, and Altmetric Attention Scores (AASs) existed in orthodontic studies. Methods: The Scopus database was searched to identify orthodontic studies published between January 1, 2017, and December 31, 2019. Articles that satisfied the eligibility criteria were included in this study. Study characteristics, including study quality were extracted and entered into a pre-pilot data collection sheet. Descriptive statistics were calculated. On an exploratory basis, random forest and gradient boosting machine learning algorithms were used to examine the influence of article characteristics on AAS. Results: In total, 586 studies with an AAS were analyzed. Overall, the mean AAS of the samples was 5. Twitter was the most popular social media platform for publicizing studies, accounting for 53.7%. In terms of study quality, only 19.1% of the studies were rated as having a high level of quality, with 41.8% of the studies deemed moderate quality. The type of social media platform, number of citations, impact factor, and study type were among the most influential characteristics of AAS in both models. In contrast, study quality was one of the least influential characteristics on the AAS. Conclusions: Social media platforms contributed the most to the AAS for orthodontic studies, whereas study quality had little impact on the AAS.