[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.13088/jiis.2022.28.4.179

Resolving data imbalance through differentiated anomaly data processing based on verification data

Hwang, Chulhyun (Dept of Big Data, Hanyang Woman's University)

Publication Information

Journal of Intelligence and Information Systems / v.28, no.4, 2022 , pp. 179-190 More about this Journal

Abstract

Data imbalance refers to a phenomenon in which the number of data in one category is too large or too small compared to another category. Due to this, it has been raised as a major factor that deteriorates performance in machine learning that utilizes classification algorithms. In order to solve the data imbalance problem, various ovrsampling methods for amplifying prime number distribution data have been proposed. Among them, SMOTE is the most representative method. In order to maximize the amplification effect of minority distribution data, various methods have emerged that remove noise included in data (SMOTE-IPF) or enhance only border lines (Borderline SMOTE). This paper proposes a method to ultimately improve classification performance by improving the processing method for anomaly data in the traditional SMOTE method that amplifies minority classification data. The proposed method consistently presented relatively high classification performance compared to the existing methods through experiments.

Keywords

Data Imbalance; Data Amplification; Anomaly Data; Borderline SMOTE;

Citations & Related Records

Reference

1	Saez, J. A., Luengo, J., Stefanowski, J., & Herrera, F. (2015). SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Information Sciences, 291, 184-203. DOI
2	Wu, G., & Chang, E. Y. (2003, August). Class-boundary alignment for imbalanced dataset learning. In ICML 2003 workshop on learning from imbalanced data sets II, Washington, DC (pp. 49-56).
3	Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357. DOI
4	Cheng, K., Zhang, C., Yu, H., Yang, X., Zou, H., & Gao, S. (2019). Grouped SMOTE with noise filtering mechanism for classifying imbalanced data. IEEE Access, 7, 170668-170681. DOI
5	Ghorbani, R., & Ghousi, R. (2020). Comparing different resampling methods in predicting students' performance using machine learning techniques. IEEE Access, 8, 67899-67911. DOI
6	Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance.
7	Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. in: Proceedings of Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, 226-231.
8	Gazzah, S., & Amara, N. E. B. (2008, September). New oversampling approaches based on polynomial fitting for imbalanced data sets. In 2008 the eighth iapr international workshop on document analysis systems (pp. 677-684). IEEE.
9	Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221-232. DOI
10	Lee, D., & Kim, N. (2022). Anomaly Detection Methodology Based on Multimodal Deep Learning. Journal of Intelligence and Information Systems, 28(2), 101-125. DOI
11	Choi, N., & Kim, W. (2019). Anomaly Detection for User Action with Generative Adversarial Networks. Journal of Intelligence and Information Systems, 25(3), 43-62. DOI
12	Fernandez, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, 61, 863-905. DOI
13	Han, H., Wang, W. Y., & Mao, B. H. (2005, August). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing (pp. 878-887). Springer, Berlin, Heidelberg.
14	Nguyen, H. D., Tran, K. P., Thomassey, S., & Hamad, M. (2021). Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management. International Journal of Information Management, 57, 102282. DOI
15	Shin, B., Lee, J., Han, S., & Park, C.-S. (2021). A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder. Journal of Intelligence and Information Systems, 27(3), 57-73. DOI
16	Ali, H., Salleh, M. N. M., Saedudin, R., Hussain, K., & Mushtaq, M. F. (2019). Imbalance class problems in data mining: a review. Indonesian Journal of Electrical Engineering and Computer Science, 14(3), 1560-1571.
17	Serradilla, O., Zugasti, E., Ramirez de Okariz, J., Rodriguez, J., & Zurutuza, U. (2021). Adaptable and explainable predictive maintenance: Semi-supervised deep learning for anomaly detection and diagnosis in press machine data. Applied Sciences, 11(16), 7376. DOI

KSCI

Resolving data imbalance through differentiated anomaly data processing based on verification data 검증데이터 기반의 차별화된 이상데이터 처리를 통한 데이터 불균형 해소 방법

Resolving data imbalance through differentiated anomaly data processing based on verification data