• Title/Summary/Keyword: SHAP 분석

Search Result 47, Processing Time 0.021 seconds

Optimizing Input Parameters of Paralichthys olivaceus Disease Classification based on SHAP Analysis (SHAP 분석 기반의 넙치 질병 분류 입력 파라미터 최적화)

  • Kyung-Won Cho;Ran Baik
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.6
    • /
    • pp.1331-1336
    • /
    • 2023
  • In text-based fish disease classification using machine learning, there is a problem that the input parameters of the machine learning model are too many, but due to performance problems, the input parameters cannot be arbitrarily reduced. This paper proposes a method of optimizing input parameters specialized for Paralichthys olivaceus disease classification using SHAP analysis techniques to solve this problem,. The proposed method includes data preprocessing of disease information extracted from the halibut disease questionnaire by applying the SHAP analysis technique and evaluating a machine learning model using AutoML. Through this, the performance of the input parameters of AutoML is evaluated and the optimal input parameter combination is derived. In this study, the proposed method is expected to be able to maintain the existing performance while reducing the number of input parameters required, which will contribute to enhancing the efficiency and practicality of text-based Paralichthys olivaceus disease classification.

The Enhancement of intrusion detection reliability using Explainable Artificial Intelligence(XAI) (설명 가능한 인공지능(XAI)을 활용한 침입탐지 신뢰성 강화 방안)

  • Jung Il Ok;Choi Woo Bin;Kim Su Chul
    • Convergence Security Journal
    • /
    • v.22 no.3
    • /
    • pp.101-110
    • /
    • 2022
  • As the cases of using artificial intelligence in various fields increase, attempts to solve various issues through artificial intelligence in the intrusion detection field are also increasing. However, the black box basis, which cannot explain or trace the reasons for the predicted results through machine learning, presents difficulties for security professionals who must use it. To solve this problem, research on explainable AI(XAI), which helps interpret and understand decisions in machine learning, is increasing in various fields. Therefore, in this paper, we propose an explanatory AI to enhance the reliability of machine learning-based intrusion detection prediction results. First, the intrusion detection model is implemented through XGBoost, and the description of the model is implemented using SHAP. And it provides reliability for security experts to make decisions by comparing and analyzing the existing feature importance and the results using SHAP. For this experiment, PKDD2007 dataset was used, and the association between existing feature importance and SHAP Value was analyzed, and it was verified that SHAP-based explainable AI was valid to give security experts the reliability of the prediction results of intrusion detection models.

A Securities Company's Customer Churn Prediction Model and Causal Inference with SHAP Value (증권 금융 상품 거래 고객의 이탈 예측 및 원인 추론)

  • Na, Kwangtek;Lee, Jinyoung;Kim, Eunchan;Lee, Hyochan
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.215-229
    • /
    • 2020
  • The interest in machine learning is growing in all industries, but it is difficult to apply it to real-world tasks because of inexplicability. This paper introduces a case of developing a financial customer churn prediction model for a securities company, and introduces the research results on an attempt to develop a machine learning model that can be explained using the SHAP Value methodology and derivation of interpretability. In this study, a total of six customer churn models are compared and analyzed, and the cause of customer churn is inferred through the classification and data analysis of SHAP Value and the type of customer asset change. Based on the results of this study, it would be possible to use it as a basis for comprehensive judgment, such as using the Value of the deviation prediction result that can infer the cause of the marketing manager's actual customer marketing in the future and establishing a target marketing strategy for each customer.

Explainable Credit Default Prediction Using SHAP (SHAP을 이용한 설명 가능한 신용카드 연체 예측)

  • Minjoong Kim;Seungwoo Kim;Jihoon Moon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.39-40
    • /
    • 2024
  • 본 연구는 SHAP(SHapley Additive exPlanations)을 활용하여 신용카드 사용자의 연체 가능성을 예측하는 기계학습 모델의 해석 가능성을 강화하는 방법을 제안한다. 대규모 신용카드 데이터를 분석하여, 고객의 나이, 성별, 결혼 상태, 결제 이력 등이 연체 발생에 미치는 영향을 명확히 하는 것을 목표로 한다. 본 연구를 토대로 금융기관은 더 정확한 위험 관리를 수행하고, 고객에게 맞춤형 서비스를 제공할 수 있는 기반을 마련할 수 있다.

  • PDF

Exploration of Factors on Pre-service Science Teachers' Major Satisfaction and Academic Satisfaction Using Machine Learning and Explainable AI SHAP (머신러닝과 설명가능한 인공지능 SHAP을 활용한 사범대 과학교육 전공생의 전공만족도 및 학업만족도 영향요인 탐색)

  • Jibeom Seo;Nam-Hwa Kang
    • Journal of Science Education
    • /
    • v.47 no.1
    • /
    • pp.37-51
    • /
    • 2023
  • This study explored the factors influencing major satisfaction and academic satisfaction of science education major students at the College of Education using machine learning models, random forest, gradient boosting model, and SHAP. Analysis results showed that the performance of the gradient boosting model was better than that of the random forest, but the difference was not large. Factors influencing major satisfaction include 'satisfaction with science teachers in high school corresponding to the subject of one's major', 'motivation for teaching job', and 'age'. Through the SHAP value, the influence of variables was identified, and the results were derived for the group as a whole and for individual analysis. The comprehensive and individual results could be complementary with each other. Based on the research results, implications for ways to support pre-service science teachers' major and academic satisfaction were proposed.

Model Interpretation through LIME and SHAP Model Sharing (LIME과 SHAP 모델 공유에 의한 모델 해석)

  • Yong-Gil Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.2
    • /
    • pp.177-184
    • /
    • 2024
  • In the situation of increasing data at fast speed, we use all kinds of complex ensemble and deep learning algorithms to get the highest accuracy. It's sometimes questionable how these models predict, classify, recognize, and track unknown data. Accomplishing this technique and more has been and would be the goal of intensive research and development in the data science community. A variety of reasons, such as lack of data, imbalanced data, biased data can impact the decision rendered by the learning models. Many models are gaining traction for such interpretations. Now, LIME and SHAP are commonly used, in which are two state of the art open source explainable techniques. However, their outputs represent some different results. In this context, this study introduces a coupling technique of LIME and Shap, and demonstrates analysis possibilities on the decisions made by LightGBM and Keras models in classifying a transaction for fraudulence on the IEEE CIS dataset.

Corporate Bankruptcy Prediction Model using Explainable AI-based Feature Selection (설명가능 AI 기반의 변수선정을 이용한 기업부실예측모형)

  • Gundoo Moon;Kyoung-jae Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.241-265
    • /
    • 2023
  • A corporate insolvency prediction model serves as a vital tool for objectively monitoring the financial condition of companies. It enables timely warnings, facilitates responsive actions, and supports the formulation of effective management strategies to mitigate bankruptcy risks and enhance performance. Investors and financial institutions utilize default prediction models to minimize financial losses. As the interest in utilizing artificial intelligence (AI) technology for corporate insolvency prediction grows, extensive research has been conducted in this domain. However, there is an increasing demand for explainable AI models in corporate insolvency prediction, emphasizing interpretability and reliability. The SHAP (SHapley Additive exPlanations) technique has gained significant popularity and has demonstrated strong performance in various applications. Nonetheless, it has limitations such as computational cost, processing time, and scalability concerns based on the number of variables. This study introduces a novel approach to variable selection that reduces the number of variables by averaging SHAP values from bootstrapped data subsets instead of using the entire dataset. This technique aims to improve computational efficiency while maintaining excellent predictive performance. To obtain classification results, we aim to train random forest, XGBoost, and C5.0 models using carefully selected variables with high interpretability. The classification accuracy of the ensemble model, generated through soft voting as the goal of high-performance model design, is compared with the individual models. The study leverages data from 1,698 Korean light industrial companies and employs bootstrapping to create distinct data groups. Logistic Regression is employed to calculate SHAP values for each data group, and their averages are computed to derive the final SHAP values. The proposed model enhances interpretability and aims to achieve superior predictive performance.

Impact of personal characteristics on learning performance in virtual reality-based construction safety training - Using machine learning and SHAP - (가상현실 기반 건설안전교육에서 개인특성이 학습성과에 미치는 영향 - 머신러닝과 SHAP을 활용하여 -)

  • Choi, Dajeong;Koo, Choongwan
    • Korean Journal of Construction Engineering and Management
    • /
    • v.24 no.6
    • /
    • pp.3-11
    • /
    • 2023
  • To address the high accident rate in the construction industry, there is a growing interest in implementing virtual reality (VR)-based construction safety training. However, existing training approaches often failed to consider learners' individual characteristics, resulting in inadequate training for some individuals. This study aimed to investigate the impact of personal characteristics on learning performance in VR-based construction safety training using machine learning and SHAP (SHAPley Additional exPlanations). This study revealed that age exerted the greatest influence on learning performance, while work experience had the least impact. Furthermore, age exhibited a negative relationship with learning performance, indicating that the introduction of VR-based construction safety training can be effective for younger individuals. On the other hand, academic degree, qualifications, and work experience exhibited a positive relationship. To enhance learning performance for individuals with lower academic degree, it is necessary to provide content that is easier to understand. The lower qualifications and work experience have minimal impact on learning performance, so it is important to consider other learners' characteristics so as to provide appropriate educational content. This study confirmed that personal characteristics can significantly affect learning performance in VR-based construction safety training, highlighting the potential for leveraging these findings to provide effective safety training for construction workers.

Analyzing Key Variables in Network Attack Classification on NSL-KDD Dataset using SHAP (SHAP 기반 NSL-KDD 네트워크 공격 분류의 주요 변수 분석)

  • Sang-duk Lee;Dae-gyu Kim;Chang Soo Kim
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.4
    • /
    • pp.924-935
    • /
    • 2023
  • Purpose: The central aim of this study is to leverage machine learning techniques for the classification of Intrusion Detection System (IDS) data, with a specific focus on identifying the variables responsible for enhancing overall performance. Method: First, we classified 'R2L(Remote to Local)' and 'U2R (User to Root)' attacks in the NSL-KDD dataset, which are difficult to detect due to class imbalance, using seven machine learning models, including Logistic Regression (LR) and K-Nearest Neighbor (KNN). Next, we use the SHapley Additive exPlanation (SHAP) for two classification models that showed high performance, Random Forest (RF) and Light Gradient-Boosting Machine (LGBM), to check the importance of variables that affect classification for each model. Result: In the case of RF, the 'service' variable and in the case of LGBM, the 'dst_host_srv_count' variable were confirmed to be the most important variables. These pivotal variables serve as key factors capable of enhancing performance in the context of classification for each respective model. Conclusion: In conclusion, this paper successfully identifies the optimal models, RF and LGBM, for classifying 'R2L' and 'U2R' attacks, while elucidating the crucial variables associated with each selected model.

Credit Card Fraud Detection Based on SHAP Considering Time Sequences (시간대를 고려한 SHAP 기반의 신용카드 이상 거래 탐지)

  • Soyeon yang;Yujin Lim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.370-372
    • /
    • 2023
  • 신용카드 부정 사용은 고객 및 기업의 신용과 재산에 막대한 손실을 미치고 있다. 이에 따라 금융사들은 이상금융거래탐지시스템을 도입하였으나 이상 거래 발생 여부를 지속적으로 모니터링하고 있기 때문에 시스템 유지에 많은 비용이 따른다. 따라서 본 논문에서는 컴퓨팅 리소스를 절약함과 동시에 성능 개선 효과를 보인 신용카드 이상 거래 탐지 알고리즘을 제안한다. CTGAN 을 활용하여 정상 거래와 이상 거래의 비율을 일부 완화하였고 XAI 기법인 SHAP 를 활용하여 유의미한 속성값을 선택하였다. 이것을 기반으로 LSTM Autoencoder를 사용하여 이상데이터를 탐지하였다. 그 결과 전통적인 비지도 학습 기법에 비해 제안 알고리즘이 우수한 성능을 보였음을 확인하였다.