• Title/Summary/Keyword: GBM (gradient boosting machine)

Search Result 38, Processing Time 0.023 seconds

Ensemble Learning-Based Prediction of Good Sellers in Overseas Sales of Domestic Books and Keyword Analysis of Reviews of the Good Sellers (앙상블 학습 기반 국내 도서의 해외 판매 굿셀러 예측 및 굿셀러 리뷰 키워드 분석)

  • Do Young Kim;Na Yeon Kim;Hyon Hee Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.4
    • /
    • pp.173-178
    • /
    • 2023
  • As Korean literature spreads around the world, its position in the overseas publishing market has become important. As demand in the overseas publishing market continues to grow, it is essential to predict future book sales and analyze the characteristics of books that have been highly favored by overseas readers in the past. In this study, we proposed ensemble learning based prediction model and analyzed characteristics of the cumulative sales of more than 5,000 copies classified as good sellers published overseas over the past 5 years. We applied the five ensemble learning models, i.e., XGBoost, Gradient Boosting, Adaboost, LightGBM, and Random Forest, and compared them with other machine learning algorithms, i.e., Support Vector Machine, Logistic Regression, and Deep Learning. Our experimental results showed that the ensemble algorithm outperforms other approaches in troubleshooting imbalanced data. In particular, the LightGBM model obtained an AUC value of 99.86% which is the best prediction performance. Among the features used for prediction, the most important feature is the author's number of overseas publications, and the second important feature is publication in countries with the largest publication market size. The number of evaluation participants is also an important feature. In addition, text mining was performed on the four book reviews that sold the most among good-selling books. Many reviews were interested in stories, characters, and writers and it seems that support for translation is needed as many of the keywords of "translation" appear in low-rated reviews.

The Analysis of the Activity Patterns of Dog with Wearable Sensors Using Machine Learning

  • Hussain, Ali;Ali, Sikandar;Kim, Hee-Cheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.141-143
    • /
    • 2021
  • The Activity patterns of animal species are difficult to access and the behavior of freely moving individuals can not be assessed by direct observation. As it has become large challenge to understand the activity pattern of animals such as dogs, and cats etc. One approach for monitoring these behaviors is the continuous collection of data by human observers. Therefore, in this study we assess the activity patterns of dog using the wearable sensors data such as accelerometer and gyroscope. A wearable, sensor -based system is suitable for such ends, and it will be able to monitor the dogs in real-time. The basic purpose of this study was to develop a system that can detect the activities based on the accelerometer and gyroscope signals. Therefore, we purpose a method which is based on the data collected from 10 dogs, including different nine breeds of different sizes and ages, and both genders. We applied six different state-of-the-art classifiers such as Random forests (RF), Support vector machine (SVM), Gradient boosting machine (GBM), XGBoost, k-nearest neighbors (KNN), and Decision tree classifier, respectively. The Random Forest showed a good classification result. We achieved an accuracy 86.73% while the detecting the activity.

  • PDF

Machine Learning Based MMS Point Cloud Semantic Segmentation (머신러닝 기반 MMS Point Cloud 의미론적 분할)

  • Bae, Jaegu;Seo, Dongju;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_3
    • /
    • pp.939-951
    • /
    • 2022
  • The most important factor in designing autonomous driving systems is to recognize the exact location of the vehicle within the surrounding environment. To date, various sensors and navigation systems have been used for autonomous driving systems; however, all have limitations. Therefore, the need for high-definition (HD) maps that provide high-precision infrastructure information for safe and convenient autonomous driving is increasing. HD maps are drawn using three-dimensional point cloud data acquired through a mobile mapping system (MMS). However, this process requires manual work due to the large numbers of points and drawing layers, increasing the cost and effort associated with HD mapping. The objective of this study was to improve the efficiency of HD mapping by segmenting semantic information in an MMS point cloud into six classes: roads, curbs, sidewalks, medians, lanes, and other elements. Segmentation was performed using various machine learning techniques including random forest (RF), support vector machine (SVM), k-nearest neighbor (KNN), and gradient-boosting machine (GBM), and 11 variables including geometry, color, intensity, and other road design features. MMS point cloud data for a 130-m section of a five-lane road near Minam Station in Busan, were used to evaluate the segmentation models; the average F1 scores of the models were 95.43% for RF, 92.1% for SVM, 91.05% for GBM, and 82.63% for KNN. The RF model showed the best segmentation performance, with F1 scores of 99.3%, 95.5%, 94.5%, 93.5%, and 90.1% for roads, sidewalks, curbs, medians, and lanes, respectively. The variable importance results of the RF model showed high mean decrease accuracy and mean decrease gini for XY dist. and Z dist. variables related to road design, respectively. Thus, variables related to road design contributed significantly to the segmentation of semantic information. The results of this study demonstrate the applicability of segmentation of MMS point cloud data based on machine learning, and will help to reduce the cost and effort associated with HD mapping.

Income prediction of apple and pear farmers in Chungnam area by automatic machine learning with H2O.AI

  • Hyundong, Jang;Sounghun, Kim
    • Korean Journal of Agricultural Science
    • /
    • v.49 no.3
    • /
    • pp.619-627
    • /
    • 2022
  • In Korea, apples and pears are among the most important agricultural products to farmers who seek to earn money as income. Generally, farmers make decisions at various stages to maximize their income but they do not always know exactly which option will be the best one. Many previous studies were conducted to solve this problem by predicting farmers' income structure, but researchers are still exploring better approaches. Currently, machine learning technology is gaining attention as one of the new approaches for farmers' income prediction. The machine learning technique is a methodology using an algorithm that can learn independently through data. As the level of computer science develops, the performance of machine learning techniques is also improving. The purpose of this study is to predict the income structure of apples and pears using the automatic machine learning solution H2O.AI and to present some implications for apple and pear farmers. The automatic machine learning solution H2O.AI can save time and effort compared to the conventional machine learning techniques such as scikit-learn, because it works automatically to find the best solution. As a result of this research, the following findings are obtained. First, apple farmers should increase their gross income to maximize their income, instead of reducing the cost of growing apples. In particular, apple farmers mainly have to increase production in order to obtain more gross income. As a second-best option, apple farmers should decrease labor and other costs. Second, pear farmers also should increase their gross income to maximize their income but they have to increase the price of pears rather than increasing the production of pears. As a second-best option, pear farmers can decrease labor and other costs.

Predicting the Pre-Harvest Sprouting Rate in Rice Using Machine Learning (기계학습을 이용한 벼 수발아율 예측)

  • Ban, Ho-Young;Jeong, Jae-Hyeok;Hwang, Woon-Ha;Lee, Hyeon-Seok;Yang, Seo-Yeong;Choi, Myong-Goo;Lee, Chung-Keun;Lee, Ji-U;Lee, Chae Young;Yun, Yeo-Tae;Han, Chae Min;Shin, Seo Ho;Lee, Seong-Tae
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.22 no.4
    • /
    • pp.239-249
    • /
    • 2020
  • Rice flour varieties have been developed to replace wheat, and consumption of rice flour has been encouraged. damage related to pre-harvest sprouting was occurring due to a weather disaster during the ripening period. Thus, it is necessary to develop pre-harvest sprouting rate prediction system to minimize damage for pre-harvest sprouting. Rice cultivation experiments from 20 17 to 20 19 were conducted with three rice flour varieties at six regions in Gangwon-do, Chungcheongbuk-do, and Gyeongsangbuk-do. Survey components were the heading date and pre-harvest sprouting at the harvest date. The weather data were collected daily mean temperature, relative humidity, and rainfall using Automated Synoptic Observing System (ASOS) with the same region name. Gradient Boosting Machine (GBM) which is a machine learning model, was used to predict the pre-harvest sprouting rate, and the training input variables were mean temperature, relative humidity, and total rainfall. Also, the experiment for the period from days after the heading date (DAH) to the subsequent period (DA2H) was conducted to establish the period related to pre-harvest sprouting. The data were divided into training-set and vali-set for calibration of period related to pre-harvest sprouting, and test-set for validation. The result for training-set and vali-set showed the highest score for a period of 22 DAH and 24 DA2H. The result for test-set tended to overpredict pre-harvest sprouting rate on a section smaller than 3.0 %. However, the result showed a high prediction performance (R2=0.76). Therefore, it is expected that the pre-harvest sprouting rate could be able to easily predict with weather components for a specific period using machine learning.

DLDW: Deep Learning and Dynamic Weighing-based Method for Predicting COVID-19 Cases in Saudi Arabia

  • Albeshri, Aiiad
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.9
    • /
    • pp.212-222
    • /
    • 2021
  • Multiple waves of COVID-19 highlighted one crucial aspect of this pandemic worldwide that factors affecting the spread of COVID-19 infection are evolving based on various regional and local practices and events. The introduction of vaccines since early 2021 is expected to significantly control and reduce the cases. However, virus mutations and its new variant has challenged these expectations. Several countries, which contained the COVID-19 pandemic successfully in the first wave, failed to repeat the same in the second and third waves. This work focuses on COVID-19 pandemic control and management in Saudi Arabia. This work aims to predict new cases using deep learning using various important factors. The proposed method is called Deep Learning and Dynamic Weighing-based (DLDW) COVID-19 cases prediction method. Special consideration has been given to the evolving factors that are responsible for recent surges in the pandemic. For this purpose, two weights are assigned to data instance which are based on feature importance and dynamic weight-based time. Older data is given fewer weights and vice-versa. Feature selection identifies the factors affecting the rate of new cases evolved over the period. The DLDW method produced 80.39% prediction accuracy, 6.54%, 9.15%, and 7.19% higher than the three other classifiers, Deep learning (DL), Random Forest (RF), and Gradient Boosting Machine (GBM). Further in Saudi Arabia, our study implicitly concluded that lockdowns, vaccination, and self-aware restricted mobility of residents are effective tools in controlling and managing the COVID-19 pandemic.

Forecasting of the COVID-19 pandemic situation of Korea

  • Goo, Taewan;Apio, Catherine;Heo, Gyujin;Lee, Doeun;Lee, Jong Hyeok;Lim, Jisun;Han, Kyulhee;Park, Taesung
    • Genomics & Informatics
    • /
    • v.19 no.1
    • /
    • pp.11.1-11.8
    • /
    • 2021
  • For the novel coronavirus disease 2019 (COVID-19), predictive modeling, in the literature, uses broadly susceptible exposed infected recoverd (SEIR)/SIR, agent-based, curve-fitting models. Governments and legislative bodies rely on insights from prediction models to suggest new policies and to assess the effectiveness of enforced policies. Therefore, access to accurate outbreak prediction models is essential to obtain insights into the likely spread and consequences of infectious diseases. The objective of this study is to predict the future COVID-19 situation of Korea. Here, we employed 5 models for this analysis; SEIR, local linear regression (LLR), negative binomial (NB) regression, segment Poisson, deep-learning based long short-term memory models (LSTM) and tree based gradient boosting machine (GBM). After prediction, model performance comparison was evelauated using relative mean squared errors (RMSE) for two sets of train (January 20, 2020-December 31, 2020 and January 20, 2020-January 31, 2021) and testing data (January 1, 2021-February 28, 2021 and February 1, 2021-February 28, 2021) . Except for segmented Poisson model, the other models predicted a decline in the daily confirmed cases in the country for the coming future. RMSE values' comparison showed that LLR, GBM, SEIR, NB, and LSTM respectively, performed well in the forecasting of the pandemic situation of the country. A good understanding of the epidemic dynamics would greatly enhance the control and prevention of COVID-19 and other infectious diseases. Therefore, with increasing daily confirmed cases since this year, these results could help in the pandemic response by informing decisions about planning, resource allocation, and decision concerning social distancing policies.

Predicting As Contamination Risk in Red River Delta using Machine Learning Algorithms

  • Ottong, Zheina J.;Puspasari, Reta L.;Yoon, Daeung;Kim, Kyoung-Woong
    • Economic and Environmental Geology
    • /
    • v.55 no.2
    • /
    • pp.127-135
    • /
    • 2022
  • Excessive presence of As level in groundwater is a major health problem worldwide. In the Red River Delta in Vietnam, several million residents possess a high risk of chronic As poisoning. The As releases into groundwater caused by natural process through microbially-driven reductive dissolution of Fe (III) oxides. It has been extracted by Red River residents using private tube wells for drinking and daily purposes because of their unawareness of the contamination. This long-term consumption of As-contaminated groundwater could lead to various health problems. Therefore, a predictive model would be useful to expose contamination risks of the wells in the Red River Delta Vietnam area. This study used four machine learning algorithms to predict the As probability of study sites in Red River Delta, Vietnam. The GBM was the best performing model with the accuracy, precision, sensitivity, and specificity of 98.7%, 100%, 95.2%, and 100%, respectively. In addition, it resulted the highest AUC of 92% and 96% for the PRC and ROC curves, with Eh and Fe as the most important variables. The partial dependence plot of As concentration on the model parameters showed that the probability of high level of As is related to the low number of wells' depth, Eh, and SO4, along with high PO43- and NH4+. This condition triggers the reductive dissolution of iron phases, thus releasing As into groundwater.