Browse > Article
http://dx.doi.org/10.13088/jiis.2022.28.4.179

Resolving data imbalance through differentiated anomaly data processing based on verification data  

Hwang, Chulhyun (Dept of Big Data, Hanyang Woman's University)
Publication Information
Journal of Intelligence and Information Systems / v.28, no.4, 2022 , pp. 179-190 More about this Journal
Abstract
Data imbalance refers to a phenomenon in which the number of data in one category is too large or too small compared to another category. Due to this, it has been raised as a major factor that deteriorates performance in machine learning that utilizes classification algorithms. In order to solve the data imbalance problem, various ovrsampling methods for amplifying prime number distribution data have been proposed. Among them, SMOTE is the most representative method. In order to maximize the amplification effect of minority distribution data, various methods have emerged that remove noise included in data (SMOTE-IPF) or enhance only border lines (Borderline SMOTE). This paper proposes a method to ultimately improve classification performance by improving the processing method for anomaly data in the traditional SMOTE method that amplifies minority classification data. The proposed method consistently presented relatively high classification performance compared to the existing methods through experiments.
Keywords
Data Imbalance; Data Amplification; Anomaly Data; Borderline SMOTE;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Saez, J. A., Luengo, J., Stefanowski, J., & Herrera, F. (2015). SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Information Sciences, 291, 184-203.   DOI
2 Wu, G., & Chang, E. Y. (2003, August). Class-boundary alignment for imbalanced dataset learning. In ICML 2003 workshop on learning from imbalanced data sets II, Washington, DC (pp. 49-56).
3 Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.   DOI
4 Cheng, K., Zhang, C., Yu, H., Yang, X., Zou, H., & Gao, S. (2019). Grouped SMOTE with noise filtering mechanism for classifying imbalanced data. IEEE Access, 7, 170668-170681.   DOI
5 Ghorbani, R., & Ghousi, R. (2020). Comparing different resampling methods in predicting students' performance using machine learning techniques. IEEE Access, 8, 67899-67911.   DOI
6 Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance.
7 Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. in: Proceedings of Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, 226-231.
8 Gazzah, S., & Amara, N. E. B. (2008, September). New oversampling approaches based on polynomial fitting for imbalanced data sets. In 2008 the eighth iapr international workshop on document analysis systems (pp. 677-684). IEEE.
9 Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221-232.   DOI
10 Lee, D., & Kim, N. (2022). Anomaly Detection Methodology Based on Multimodal Deep Learning. Journal of Intelligence and Information Systems, 28(2), 101-125.   DOI
11 Choi, N., & Kim, W. (2019). Anomaly Detection for User Action with Generative Adversarial Networks. Journal of Intelligence and Information Systems, 25(3), 43-62.   DOI
12 Fernandez, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, 61, 863-905.   DOI
13 Han, H., Wang, W. Y., & Mao, B. H. (2005, August). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing (pp. 878-887). Springer, Berlin, Heidelberg.
14 Nguyen, H. D., Tran, K. P., Thomassey, S., & Hamad, M. (2021). Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management. International Journal of Information Management, 57, 102282.   DOI
15 Shin, B., Lee, J., Han, S., & Park, C.-S. (2021). A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder. Journal of Intelligence and Information Systems, 27(3), 57-73.   DOI
16 Ali, H., Salleh, M. N. M., Saedudin, R., Hussain, K., & Mushtaq, M. F. (2019). Imbalance class problems in data mining: a review. Indonesian Journal of Electrical Engineering and Computer Science, 14(3), 1560-1571.
17 Serradilla, O., Zugasti, E., Ramirez de Okariz, J., Rodriguez, J., & Zurutuza, U. (2021). Adaptable and explainable predictive maintenance: Semi-supervised deep learning for anomaly detection and diagnosis in press machine data. Applied Sciences, 11(16), 7376.   DOI