• Title/Summary/Keyword: SHAP Analysis

Search Result 57, Processing Time 0.029 seconds

Prediction of Agricultural Purchases Using Structured and Unstructured Data: Focusing on Paprika (정형 및 비정형 데이터를 이용한 농산물 구매량 예측: 파프리카를 중심으로)

  • Somakhamixay Oui;Kyung-Hee Lee;HyungChul Rah;Eun-Seon Choi;Wan-Sup Cho
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.169-179
    • /
    • 2021
  • Consumers' food consumption behavior is likely to be affected not only by structured data such as consumer panel data but also by unstructured data such as mass media and social media. In this study, a deep learning-based consumption prediction model is generated and verified for the fusion data set linking structured data and unstructured data related to food consumption. The results of the study showed that model accuracy was improved when combining structured data and unstructured data. In addition, unstructured data were found to improve model predictability. As a result of using the SHAP technique to identify the importance of variables, it was found that variables related to blog and video data were on the top list and had a positive correlation with the amount of paprika purchased. In addition, according to the experimental results, it was confirmed that the machine learning model showed higher accuracy than the deep learning model and could be an efficient alternative to the existing time series analysis modeling.

Development of a Resort's Cross-selling Prediction Model and Its Interpretation using SHAP (리조트 교차판매 예측모형 개발 및 SHAP을 이용한 해석)

  • Boram Kang;Hyunchul Ahn
    • The Journal of Bigdata
    • /
    • v.7 no.2
    • /
    • pp.195-204
    • /
    • 2022
  • The tourism industry is facing a crisis due to the recent COVID-19 pandemic, and it is vital to improving profitability to overcome it. In situations such as COVID-19, it would be more efficient to sell additional products other than guest rooms to customers who have visited to increase the unit price rather than adopting an aggressive sales strategy to increase room occupancy to increase profits. Previous tourism studies have used machine learning techniques for demand forecasting, but there have been few studies on cross-selling forecasting. Also, in a broader sense, a resort is the same accommodation industry as a hotel. However, there is no study specialized in the resort industry, which is operated based on a membership system and has facilities suitable for lodging and cooking. Therefore, in this study, we propose a cross-selling prediction model using various machine learning techniques with an actual resort company's accommodation data. In addition, by applying the explainable artificial intelligence XAI(eXplainable AI) technique, we intend to interpret what factors affect cross-selling and confirm how they affect cross-selling through empirical analysis.

Prediction of Stock Returns from News Article's Recommended Stocks Using XGBoost and LightGBM Models

  • Yoo-jin Hwang;Seung-yeon Son;Zoon-ky Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.2
    • /
    • pp.51-59
    • /
    • 2024
  • This study examines the relationship between the release of the news and the individual stock returns. Investors utilize a variety of information sources to maximize stock returns when establishing investment strategies. News companies publish their articles based on stock recommendation reports of analysts, enhancing the reliability of the information. Defining release of a stock-recommendation news article as an event, we examine its economic impacts and propose a binary classification model that predicts the stock return 10 days after the event. XGBoost and LightGBM models are applied for the study with accuracy of 75%, 71% respectively. In addition, after categorizing the recommended stocks based on the listed market(KOSPI/KOSDAQ) and market capitalization(Big/Small), this study verifies difference in the accuracy of models across four sub-datasets. Finally, by conducting SHAP(Shapley Additive exPlanations) analysis, we identify the key variables in each model, reinforcing the interpretability of models.

A Machine Learning-based Popularity Prediction Model for YouTube Mukbang Content (머신러닝 기반의 유튜브 먹방 콘텐츠 인기 예측 모델)

  • Beomgeun Seo;Hanjun Lee
    • Journal of Internet Computing and Services
    • /
    • v.24 no.6
    • /
    • pp.49-55
    • /
    • 2023
  • In this study, models for predicting the popularity of mukbang content on YouTube were proposed, and factors influencing the popularity of mukbang content were identified through post-analysis. To accomplish this, information on 22,223 pieces of content was collected from top mukbang channels in terms of subscribers using APIs and Pretty Scale. Machine learning algorithms such as Random Forest, XGBoost, and LGBM were used to build models for predicting views and likes. The results of SHAP analysis showed that subscriber count had the most significant impact on view prediction models, while the attractiveness of a creator emerged as the most important variable in the likes prediction model. This confirmed that the precursor factors for content views and likes reactions differ. This study holds academic significance in analyzing a large amount of online content and conducting empirical analysis. It also has practical significance as it informs mukbang creators about viewer content consumption trends and provides guidance for producing high-quality, marketable content.

Development of Tree Detection Methods for Estimating LULUCF Settlement Greenhouse Gas Inventories Using Vegetation Indices (식생지수를 활용한 LULUCF 정주지 온실가스 인벤토리 산정을 위한 수목탐지 방법 개발)

  • Joon-Woo Lee;Yu-Han Han;Jeong-Taek Lee;Jin-Hyuk Park;Geun-Han Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_3
    • /
    • pp.1721-1730
    • /
    • 2023
  • As awareness of the problem of global warming emerges around the world, the role of carbon sinks in settlement is increasingly emphasized to achieve carbon neutrality in urban areas. In order to manage carbon sinks in settlement, it is necessary to identify the current status of carbon sinks. Identifying the status of carbon sinks requires a lot of manpower and time and a corresponding budget. Therefore, in this study, a map predicting the location of trees was created using already established tree location information and Sentinel-2 satellite images targeting Seoul. To this end, after constructing a tree presence/absence dataset, structured data was generated using 16 types of vegetation indices information constructed from satellite images. After learning this by applying the Extreme Gradient Boosting (XGBoost) model, a tree prediction map was created. Afterward, the correlation between independent and dependent variables was investigated in model learning using the Shapely value of Shapley Additive exPlanations(SHAP). A comparative analysis was performed between maps produced for local parts of Seoul and sub-categorized land cover maps. In the case of the tree prediction model produced in this study, it was confirmed that even hard-to-detect street trees around the main street were predicted as trees.

Development of AI-based Smart Agriculture Early Warning System

  • Hyun Sim;Hyunwook Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.67-77
    • /
    • 2023
  • This study represents an innovative research conducted in the smart farm environment, developing a deep learning-based disease and pest detection model and applying it to the Intelligent Internet of Things (IoT) platform to explore new possibilities in the implementation of digital agricultural environments. The core of the research was the integration of the latest ImageNet models such as Pseudo-Labeling, RegNet, EfficientNet, and preprocessing methods to detect various diseases and pests in complex agricultural environments with high accuracy. To this end, ensemble learning techniques were applied to maximize the accuracy and stability of the model, and the model was evaluated using various performance indicators such as mean Average Precision (mAP), precision, recall, accuracy, and box loss. Additionally, the SHAP framework was utilized to gain a deeper understanding of the model's prediction criteria, making the decision-making process more transparent. This analysis provided significant insights into how the model considers various variables to detect diseases and pests.

Prediction of Disk Cutter Wear Considering Ground Conditions and TBM Operation Parameters (지반 조건과 TBM 운영 파라미터를 고려한 디스크 커터 마모 예측)

  • Yunseong Kang;Tae Young Ko
    • Tunnel and Underground Space
    • /
    • v.34 no.2
    • /
    • pp.143-153
    • /
    • 2024
  • Tunnel Boring Machine (TBM) method is a tunnel excavation method that produces lower levels of noise and vibration during excavation compared to drilling and blasting methods, and it offers higher stability. It is increasingly being applied to tunnel projects worldwide. The disc cutter is an excavation tool mounted on the cutterhead of a TBM, which constantly interacts with the ground at the tunnel face, inevitably leading to wear. In this study quantitatively predicted disc cutter wear using geological conditions, TBM operational parameters, and machine learning algorithms. Among the input variables for predicting disc cutter wear, the Uniaxial Compressive Strength (UCS) is considerably limited compared to machine and wear data, so the UCS estimation for the entire section was first conducted using TBM machine data, and then the prediction of the Coefficient of Wearing rate(CW) was performed with the completed data. Comparing the performance of CW prediction models, the XGBoost model showed the highest performance, and SHapley Additive exPlanation (SHAP) analysis was conducted to interpret the complex prediction model.

A generalized explainable approach to predict the hardened properties of self-compacting geopolymer concrete using machine learning techniques

  • Endow Ayar Mazumder;Sanjog Chhetri Sapkota;Sourav Das;Prasenjit Saha;Pijush Samui
    • Computers and Concrete
    • /
    • v.34 no.3
    • /
    • pp.279-296
    • /
    • 2024
  • In this study, ensemble machine learning (ML) models are employed to estimate the hardened properties of Self-Compacting Geopolymer Concrete (SCGC). The input variables affecting model development include the content of the SCGC such as the binder material, the age of the specimen, and the ratio of alkaline solution. On the other hand, the output parameters examined includes compressive strength, flexural strength, and split tensile strength. The ensemble machine learning models are trained and validated using a database comprising 396 records compiled from 132 unique mix trials performed in the laboratory. Diverse machine learning techniques, notably K-nearest neighbours (KNN), Random Forest, and Extreme Gradient Boosting (XGBoost), have been employed to construct the models coupled with Bayesian optimisation (BO) for the purpose of hyperparameter tuning. Furthermore, the application of nested cross-validation has been employed in order to mitigate the risk of overfitting. The findings of this study reveal that the BO-XGBoost hybrid model confirms better predictive accuracy in comparison to other models. The R2 values for compressive strength, flexural strength, and split tensile strength are 0.9974, 0.9978, and 0.9937, respectively. Additionally, the BO-XGBoost hybrid model exhibits the lowest RMSE values of 0.8712, 0.0773, and 0.0799 for compressive strength, flexural strength, and split tensile strength, respectively. Furthermore, a SHAP dependency analysis was conducted to ascertain the significance of each parameter. It is observed from this study that GGBS, Flyash, and the age of specimens exhibit a substantial level of influence when predicting the strengths of geopolymers.

Explainable Artificial Intelligence (XAI) Surrogate Models for Chemical Process Design and Analysis (화학 공정 설계 및 분석을 위한 설명 가능한 인공지능 대안 모델)

  • Yuna Ko;Jonggeol Na
    • Korean Chemical Engineering Research
    • /
    • v.61 no.4
    • /
    • pp.542-549
    • /
    • 2023
  • Since the growing interest in surrogate modeling, there has been continuous research aimed at simulating nonlinear chemical processes using data-driven machine learning. However, the opaque nature of machine learning models, which limits their interpretability, poses a challenge for their practical application in industry. Therefore, this study aims to analyze chemical processes using Explainable Artificial Intelligence (XAI), a concept that improves interpretability while ensuring model accuracy. While conventional sensitivity analysis of chemical processes has been limited to calculating and ranking the sensitivity indices of variables, we propose a methodology that utilizes XAI to not only perform global and local sensitivity analysis, but also examine the interactions among variables to gain physical insights from the data. For the ammonia synthesis process, which is the target process of the case study, we set the temperature of the preheater leading to the first reactor and the split ratio of the cold shot to the three reactors as process variables. By integrating Matlab and Aspen Plus, we obtained data on ammonia production and the maximum temperatures of the three reactors while systematically varying the process variables. We then trained tree-based models and performed sensitivity analysis using the SHAP technique, one of the XAI methods, on the most accurate model. The global sensitivity analysis showed that the preheater temperature had the greatest effect, and the local sensitivity analysis provided insights for defining the ranges of process variables to improve productivity and prevent overheating. By constructing alternative models for chemical processes and using XAI for sensitivity analysis, this work contributes to providing both quantitative and qualitative feedback for process optimization.

Analysis of the impact of mathematics education research using explainable AI (설명가능한 인공지능을 활용한 수학교육 연구의 영향력 분석)

  • Oh, Se Jun
    • The Mathematical Education
    • /
    • v.62 no.3
    • /
    • pp.435-455
    • /
    • 2023
  • This study primarily focused on the development of an Explainable Artificial Intelligence (XAI) model to discern and analyze papers with significant impact in the field of mathematics education. To achieve this, meta-information from 29 domestic and international mathematics education journals was utilized to construct a comprehensive academic research network in mathematics education. This academic network was built by integrating five sub-networks: 'paper and its citation network', 'paper and author network', 'paper and journal network', 'co-authorship network', and 'author and affiliation network'. The Random Forest machine learning model was employed to evaluate the impact of individual papers within the mathematics education research network. The SHAP, an XAI model, was used to analyze the reasons behind the AI's assessment of impactful papers. Key features identified for determining impactful papers in the field of mathematics education through the XAI included 'paper network PageRank', 'changes in citations per paper', 'total citations', 'changes in the author's h-index', and 'citations per paper of the journal'. It became evident that papers, authors, and journals play significant roles when evaluating individual papers. When analyzing and comparing domestic and international mathematics education research, variations in these discernment patterns were observed. Notably, the significance of 'co-authorship network PageRank' was emphasized in domestic mathematics education research. The XAI model proposed in this study serves as a tool for determining the impact of papers using AI, providing researchers with strategic direction when writing papers. For instance, expanding the paper network, presenting at academic conferences, and activating the author network through co-authorship were identified as major elements enhancing the impact of a paper. Based on these findings, researchers can have a clear understanding of how their work is perceived and evaluated in academia and identify the key factors influencing these evaluations. This study offers a novel approach to evaluating the impact of mathematics education papers using an explainable AI model, traditionally a process that consumed significant time and resources. This approach not only presents a new paradigm that can be applied to evaluations in various academic fields beyond mathematics education but also is expected to substantially enhance the efficiency and effectiveness of research activities.