• Title/Summary/Keyword: interpretability methods

Search Result 22, Processing Time 0.02 seconds

An Efficient One Class Classifier Using Gaussian-based Hyper-Rectangle Generation (가우시안 기반 Hyper-Rectangle 생성을 이용한 효율적 단일 분류기)

  • Kim, Do Gyun;Choi, Jin Young;Ko, Jeonghan
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.2
    • /
    • pp.56-64
    • /
    • 2018
  • In recent years, imbalanced data is one of the most important and frequent issue for quality control in industrial field. As an example, defect rate has been drastically reduced thanks to highly developed technology and quality management, so that only few defective data can be obtained from production process. Therefore, quality classification should be performed under the condition that one class (defective dataset) is even smaller than the other class (good dataset). However, traditional multi-class classification methods are not appropriate to deal with such an imbalanced dataset, since they classify data from the difference between one class and the others that can hardly be found in imbalanced datasets. Thus, one-class classification that thoroughly learns patterns of target class is more suitable for imbalanced dataset since it only focuses on data in a target class. So far, several one-class classification methods such as one-class support vector machine, neural network and decision tree there have been suggested. One-class support vector machine and neural network can guarantee good classification rate, and decision tree can provide a set of rules that can be clearly interpreted. However, the classifiers obtained from the former two methods consist of complex mathematical functions and cannot be easily understood by users. In case of decision tree, the criterion for rule generation is ambiguous. Therefore, as an alternative, a new one-class classifier using hyper-rectangles was proposed, which performs precise classification compared to other methods and generates rules clearly understood by users as well. In this paper, we suggest an approach for improving the limitations of those previous one-class classification algorithms. Specifically, the suggested approach produces more improved one-class classifier using hyper-rectangles generated by using Gaussian function. The performance of the suggested algorithm is verified by a numerical experiment, which uses several datasets in UCI machine learning repository.

Hourly Prediction of Particulate Matter (PM2.5) Concentration Using Time Series Data and Random Forest (시계열 데이터와 랜덤 포레스트를 활용한 시간당 초미세먼지 농도 예측)

  • Lee, Deukwoo;Lee, Soowon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.4
    • /
    • pp.129-136
    • /
    • 2020
  • PM2.5 which is a very tiny air particulate matter even smaller than PM10 has been issued in the environmental problem. Since PM2.5 can cause eye diseases or respiratory problems and infiltrate even deep blood vessels in the brain, it is important to predict PM2.5. However, it is difficult to predict PM2.5 because there is no clear explanation yet regarding the creation and the movement of PM2.5. Thus, prediction methods which not only predict PM2.5 accurately but also have the interpretability of the result are needed. To predict hourly PM2.5 of Seoul city, we propose a method using random forest with the adjusted bootstrap number from the time series ground data preprocessed on different sources. With this method, the prediction model can be trained uniformly on hourly information and the result has the interpretability. To evaluate the prediction performance, we conducted comparative experiments. As a result, the performance of the proposed method was superior against other models in all labels. Also, the proposed method showed the importance of the variables regarding the creation of PM2.5 and the effect of China.

Explainable Artificial Intelligence (XAI) Surrogate Models for Chemical Process Design and Analysis (화학 공정 설계 및 분석을 위한 설명 가능한 인공지능 대안 모델)

  • Yuna Ko;Jonggeol Na
    • Korean Chemical Engineering Research
    • /
    • v.61 no.4
    • /
    • pp.542-549
    • /
    • 2023
  • Since the growing interest in surrogate modeling, there has been continuous research aimed at simulating nonlinear chemical processes using data-driven machine learning. However, the opaque nature of machine learning models, which limits their interpretability, poses a challenge for their practical application in industry. Therefore, this study aims to analyze chemical processes using Explainable Artificial Intelligence (XAI), a concept that improves interpretability while ensuring model accuracy. While conventional sensitivity analysis of chemical processes has been limited to calculating and ranking the sensitivity indices of variables, we propose a methodology that utilizes XAI to not only perform global and local sensitivity analysis, but also examine the interactions among variables to gain physical insights from the data. For the ammonia synthesis process, which is the target process of the case study, we set the temperature of the preheater leading to the first reactor and the split ratio of the cold shot to the three reactors as process variables. By integrating Matlab and Aspen Plus, we obtained data on ammonia production and the maximum temperatures of the three reactors while systematically varying the process variables. We then trained tree-based models and performed sensitivity analysis using the SHAP technique, one of the XAI methods, on the most accurate model. The global sensitivity analysis showed that the preheater temperature had the greatest effect, and the local sensitivity analysis provided insights for defining the ranges of process variables to improve productivity and prevent overheating. By constructing alternative models for chemical processes and using XAI for sensitivity analysis, this work contributes to providing both quantitative and qualitative feedback for process optimization.

Analysis of Korean GDP by unobserved components model (비관측요인모형을 이용한 한국의 국내총생산 분석)

  • Seong, Byeong-Chan;Lee, Seung-Kyung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.829-837
    • /
    • 2011
  • Since Harvey (1989), many approaches for applying unobserved components (UC) models to both univariate and multivariate time series analysis have been developed. However, practitioners still tend to use traditional methods such as exponential smoothing or ARIMA models for modeling and predicting time series data. It is well known that the UC model combines the flexibility of ARIMA models and the easy interpretability of exponential smoothing models by using unobserved components such as trend, cycle, season, and irregular components. This study reviews the UC model and compares its relative performances with those of the other models in modeling and predicting the real gross domestic products (GDP) in Korea. We conclude that the optimal model is the UC model on basis of root mean squared error.

Development and Validation of the Communication Behavior Scale for Nurses Caring for People with Dementia (치매대상자를 돌보는 간호사의 의사소통행위 측정도구 개발 및 평가)

  • Lee, Jihye;Gang, Moonhee
    • Journal of Korean Academy of Nursing
    • /
    • v.49 no.1
    • /
    • pp.1-13
    • /
    • 2019
  • Purpose: The purpose of this study was to develop and validate the Communication Behavior Scale for nurses caring for people with Dementia (CBS-D). Methods: Based on communication accommodation theory, the initial items were generated through a literature review and interviews with 20 experts. Content and face validity of the initial items were assessed. Data from 486 nurses caring for people with dementia were analyzed using item analysis, exploratory and confirmatory factor analysis, criterion-related validity, and internal consistency. Results: The final scale consisted of 18 items and four factors (discourse response management, interpersonal control, emotional expression, and interpretability) that explained 57.6% of the variance. Confirmatory factor analysis indicated that the theoretical model with 18 items satisfied all goodness-of-fit parameters. Criterion-related validity was shown by the Global Interpersonal Communication Competence Scale (r=.506, p<.001). Cronbach's alpha for the total scale was .88. Conclusion: The CBS-D can be used to measure the communication behavior of nurses caring for people with dementia.

Aeroengine performance degradation prediction method considering operating conditions

  • Bangcheng Zhang;Shuo Gao;Zhong Zheng;Guanyu Hu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2314-2333
    • /
    • 2023
  • It is significant to predict the performance degradation of complex electromechanical systems. Among the existing performance degradation prediction models, belief rule base (BRB) is a model that deal with quantitative data and qualitative information with uncertainty. However, when analyzing dynamic systems where observable indicators change frequently over time and working conditions, the traditional belief rule base (BRB) can not adapt to frequent changes in working conditions, such as the prediction of aeroengine performance degradation considering working condition. For the sake of settling this problem, this paper puts forward a new hidden belief rule base (HBRB) prediction method, in which the performance of aeroengines is regarded as hidden behavior, and operating conditions are used as observable indicators of the HBRB model to describe the hidden behavior to solve the problem of performance degradation prediction under different times and operating conditions. The performance degradation prediction case study of turbofan aeroengine simulation experiments proves the advantages of HBRB model, and the results testify the effectiveness and practicability of this method. Furthermore, it is compared with other advanced forecasting methods. The results testify this model can generate better predictions in aspects of accuracy and interpretability.

Hourly electricity demand forecasting based on innovations state space exponential smoothing models (이노베이션 상태공간 지수평활 모형을 이용한 시간별 전력 수요의 예측)

  • Won, Dayoung;Seong, Byeongchan
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.581-594
    • /
    • 2016
  • We introduce innovations state space exponential smoothing models (ISS-ESM) that can analyze time series with multiple seasonal patterns. Especially, in order to control complex structure existing in the multiple patterns, the model equations use a matrix consisting of seasonal updating parameters. It enables us to group the seasonal parameters according to their similarity. Because of the grouped parameters, we can accomplish the principle of parsimony. Further, the ISS-ESM can potentially accommodate any number of multiple seasonal patterns. The models are applied to predict electricity demand in Korea that is observed on hourly basis, and we compare their performance with that of the traditional exponential smoothing methods. It is observed that the ISS-ESM are superior to the traditional methods in terms of the prediction and the interpretability of seasonal patterns.

Properties of the Measures to Assess Oxaliplatin-induced Peripheral Neuropathy: A Literature Review (옥살리플라틴 유도 말초신경독성 측정도구의 고찰)

  • Chu, Sang Hui;Lee, Yoon Ju;Lee, Young Joo;Cleeland, Charles S.
    • Journal of Korean Academy of Nursing
    • /
    • v.45 no.6
    • /
    • pp.783-801
    • /
    • 2015
  • Purpose: The purpose of this study is to provide a comprehensive overview of the various measures available for assessment of oxaliplatin-induced peripheral neuropathy (OXLIPN) and to evaluate the measurement properties of each assessment tool. Methods: A systematic review was conducted to identify existing measures for OXLIPN found in the databases of PubMed, Cochrane Library, Embase, RISS and KoreaMed. The quality of the 24 identified tools was evaluated based on their properties of measurement including content validity, internal consistency, criterion validity, construct validity, reproducibility, responsiveness, floor-ceiling effects and interpretability. Results: Ten (41.7%) of the 24 tools were identified as specific measures for assessing OXLIPN and the most popular type of measures were clinical grading systems by clinicians (58.3%) and only 29.2% of measures were identified as patient reported outcomes. The most frequently used tool was National Cancer Institute-Common Toxicity Criteria (NCI-CTC), but the validity of NCI-CTC has not been reported appropriately. Overall, the Neuropathic Pain Symptom Inventory (NPSI) received the best psychometric scores, and the Chemotherapy-induced Peripheral Neuropathy Assessment Tool (CIPNAT) and Functional Assessment of Cancer Therapy/Gynaecologic Oncology Group-neurotoxicity-12 (FACT/GOG-Ntx-12) followed NPSI. Conclusion: To select appropriate measure, evidences should be accumulated through the clinical use of tools. Therefore, practitioner and researchers are urged to report relevant statistics required for the validation of the currently used measures for assessment of OXLIPN.

An Exploratory Study of Electrochemical Skin Conductance for the Deficiency Pattern Identification in Diabetic Patients (당뇨병 환자의 허증별 전기전도도 특성에 대한 탐색적 관찰 연구)

  • Kim, Kahye;Kim, Jihye;Kim, Jaeuk U.
    • The Journal of the Society of Korean Medicine Diagnostics
    • /
    • v.22 no.1
    • /
    • pp.57-67
    • /
    • 2018
  • Objectives The objective of this study is to examine the interpretability of the questionnaire-based pattern identification in terms of biosignals. For this purpose, we investigate the relationship between electrochemical skin conductance (ESC) and Qi-Blood-Yin-Yang Deficiency Questionnaire (QBYY-Q) in diabetic patients. Methods A total of 40 patients with diabetes mellitus answered the QBYY-Q and their ESC were measured by SUDOSCAN device (a diabetes screening device, France). To analyze the relationship between QBYY-Q and ESC, ANOVA analysis and Scheffe test were performed and Pearson correlation coefficients were obtained. Results Of the 40 diabetic patients, 23 (57.5%) were males and 17 (42.5%) were females. According to the QBYY-Q, 9 patients were classified into Qi deficiency pattern (QD), 9 patients were Blood deficiency pattern (BD), 10 patients were Yin deficiency pattern (YiD) and 12 patients were Yang deficiency pattern (YaD). Demographic information (age, body mass index, duration of illness, etc.), signs of vitality (blood pressure, body temperature, etc.), fasting plasma glucose and glycated hemoglobin were not significantly different in each deficiency pattern. The ESC of the right leg was significantly lower in the BD group compared to the YiD group (p<0.022). Pearson's correlation coefficient was negatively correlated with the BD questionnaire score (r=-0.343, p <0.05). Finally, ESC showed a positive correlation with hemoglobin and erythrocyte levels in all limbs (r=0.483, p<0.01). Conclusions We showed that ESC could be used to classify the Deficiency pattern identifications in diabetic patients. Especially, the ESC was significantly lower in the BD group and was negatively correlated with the BD scores. It implies the potential utility of the ESC to understand the BD in terms of modern biosignals.

  • PDF

The Data-based Prediction of Police Calls Using Machine Learning (기계학습을 활용한 데이터 기반 경찰신고건수 예측)

  • Choi, Jaehun
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.101-112
    • /
    • 2018
  • The purpose of the study is to predict the number of police calls using neural network which is one of the machine learning and negative binomial regression, by using the data of 112 police calls received from Chungnam Provincial Police Agency from June 2016 to May 2017. The variables which may affect the police calls have been selected for developing the prediction model : time, holiday, the day before holiday, season, temperature, precipitation, wind speed, jurisdictional area, population, the number of foreigners, single house rate and other house rate. Some variables show positive correlation, and others negative one. The comparison of the methods can be summarized as follows. Neural network has correlation coefficient of 0.7702 between predicted and actual values with RMSE 2.557. Negative binomial regression on the other hand shows correlation coefficient of 0.7158 with RMSE 2.831. Neural network has low interpretability, but an excellent predictability compared with the negative binomial regression. Based on the prediction model, the police agency can do the optimal manpower allocation for given values in the selected variables.