[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/JIPS.01.0065

Re-SSS: Rebalancing Imbalanced Data Using Safe Sample Screening

Shi, Hongbo (School of Information, Shanxi University of Finance and Economics)
Chen, Xin (School of Information, Shanxi University of Finance and Economics)
Guo, Min (School of Information, Shanxi University of Finance and Economics)

Publication Information

Journal of Information Processing Systems / v.17, no.1, 2021 , pp. 89-106 More about this Journal

Abstract

Different samples can have different effects on learning support vector machine (SVM) classifiers. To rebalance an imbalanced dataset, it is reasonable to reduce non-informative samples and add informative samples for learning classifiers. Safe sample screening can identify a part of non-informative samples and retain informative samples. This study developed a resampling algorithm for Rebalancing imbalanced data using Safe Sample Screening (Re-SSS), which is composed of selecting Informative Samples (Re-SSS-IS) and rebalancing via a Weighted SMOTE (Re-SSS-WSMOTE). The Re-SSS-IS selects informative samples from the majority class, and determines a suitable regularization parameter for SVM, while the Re-SSS-WSMOTE generates informative minority samples. Both Re-SSS-IS and Re-SSS-WSMOTE are based on safe sampling screening. The experimental results show that Re-SSS can effectively improve the classification performance of imbalanced classification problems.

Keywords

Imbalanced Data; Safe Sample Screening; Re-SSS-IS; Re-SSS-WSMOTE;

Citations & Related Records

Reference

1	J. Y. Chen, J. Lalor, W. S. Liu, E. Druhl, E. Granillo, V. G. Vimalananda, and H. Yu, "Detecting hypoglycemia incidents reported in patients' secure messages: using cost-sensitive learning and oversampling to reduce data imbalance," Journal of Medical Internet Research, vol. 21, no. 3, article no. e11990, 2019. https://doi.org/10.2196/11990 DOI
2	P. A. Alaba, S. I. Popoola, L. Olatomiwa, M. B. Akanle, O. S. Ohunakin, E. Adetiba, O. D. Alex, A. A. A. Atayero, and W. M. A. W. Daud, "Towards a more efficient and cost-sensitive extreme learning machine: A state-of-the-art review of recent trend," Neurocomputing, vol. 350, pp. 70-90, 2019. DOI
3	Z. Sun, Q. Song, X. Zhu, H. Sun, B. Xu, and Y. Zhou, "A novel ensemble method for classifying imbalanced data," Pattern Recognition, vol. 48, no. 5, pp. 1623-1637, 2015. DOI
4	A. Irtazal, S. M. Adnan, K. T. Ahmed, A. Jaffar, A. Khan, A. Javed, and M. T. Mahmood, "An ensemble based evolutionary approach to the class imbalance problem with applications in CBIR," Applied Sciences, vol. 8, no. 4, article no. 495, 2018. https://doi.org/10.3390/app8040495 DOI
5	H. He, W. Zhang, and S. Zhang, "A novel ensemble method for credit scoring: adaption of different imbalance ratios," Expert Systems with Applications, vol. 98, pp. 105-117, 2018. DOI
6	D. C. Li, S. C. Hu, L. S. Lin, and C. W. Yeh, "Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets," Plos One, vol. 12, no. 8, article no. e0181853, 2017. https://doi.org/10.1371/journal.pone.0181853 DOI
7	Y. T. Yan, Z. B. Wu, X. Q. Du, J. Chen, S. Zhao, and Y. P. Zhang, "A three-way decision ensemble method for imbalanced data oversampling," International Journal of Approximate Reasoning, vol. 107, pp. 1-16, 2019. DOI
8	M. A. Naiel, M. O. Ahmad, M. N. S. Swamy, J. Lim, and M. H. Yang, "Online multi-object tracking via robust collaborative model and sample selection," Computer Vision and Image Understanding, vol. 154, pp. 94-107, 2017. DOI
9	M. A. H. Farquad and I. Bose, "Preprocessing unbalanced data using support vector machine," Decision Support Systems, vol. 53, no. 1, pp. 226-233, 2012. DOI
10	S. J. Lin, "Integrated artificial intelligence-based resizing strategy and multiple criteria decision making technique to form a management decision in an imbalanced environment," International Journal of Machine Learning and Cybernetics, vol. 8, no. 6, pp. 1981-1992, 2016. DOI
11	T. Guo, J. Wang, Q. M. Liu, and J. Y. Liang, "Kernel SVM algorithm based on identifying key samples for imbalanced data," Pattern Recognition and Artificial Intelligence, vol. 32, no. 6, pp. 569-576, 2019.
12	A. Shibagaki, M. Karasuyama, K. Hatano, and I. Takeuchi, "Simultaneous safe screening of features and samples in doubly sparse modeling," in Proceedings of the 33rd International Conference on Machine Learning (ICML), New York, NY, 2016, pp. 1577-1586.
13	T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu, "The entire regularization path for the support vector machine," Journal of Machine Learning Research, vol. 5, pp. 1391-1415, 2004.
14	R. A. R. Ashfaq, X. Z. Wang, J. Z. Huang, H. Abbas, and Y. L. He, "Fuzziness based semi-supervised learning approach for intrusion detection system," Information Sciences, vol. 378, no. 1, pp. 484-497, 2017. DOI
15	H. Shi, Q. Gao, S. Ji, and Y. Liu, "A hybrid sampling method based on safe screening for imbalanced datasets with sparse structure," in Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 2018, pp. 1-8.
16	H. Shi, Y. Liu, and S. Ji, "Safe sample screening based sampling method for imbalanced data," Pattern Recognition and Artificial Intelligence, vol. 32, no. 6, pp. 545-556, 2019.
17	K. Ogawa, Y. Suzuki, and I. Takeuchi, "Safe screening of non-support vectors in pathwise SVM computation," in Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, GA, 2013, pp. 1382-1390.
18	A. Luque, A. Carrasco, A. Martin, and A. de las Heras, "The impact of class imbalance in classification performance metrics based on the binary confusion matrix," Pattern Recognition, vol. 91, pp. 216-231, 2019. DOI
19	K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, and D. I. Fotiadis, "Machine learning applications in cancer prognosis and prediction," Computational and Structural Biotechnology Journal, vol. 13, pp. 8-17, 2015. DOI
20	D. Sanchez, M. A. Vila, L. Cerda, and J. M. Serrano, "Association rules applied to credit card fraud detection," Expert Systems with Applications, vol. 36, no. 2, pp. 3630-3640, 2009. DOI
21	X. Y. Liu, J. Wu, and Z. H. Zhou, "Exploratory undersampling for class-imbalance learning," IEEE Transactions On Systems Man And Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539-550, 2009.
22	N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, vol.16, pp. 321-357, 2002. DOI
23	D. Devi, S. K. Biswas, and B. Purkayastha, "Learning in presence of class imbalance and class overlapping by using one-class SVM and undersampling technique," Connection Science, vol. 31, no. 2, pp. 105-142, 2019. DOI
24	A. Onan, "Consensus clustering-based undersampling approach to imbalanced learning," Scientific Programming, vol. 2019, article no. 5901087, 2019. https://doi.org/10.1155/2019/5901087 DOI
25	H. Han, W. Y. Wang, and B. H. Mao, "Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning," in Advances in Intelligent Computing. Heidelberg, Germany: Springer, 2005, pp. 878-887.
26	Q. Wang, "A hybrid sampling SVM approach to imbalanced data classification," Abstract and Applied Analysis, vol. 2014, article no. 973786, 2014. https://doi.org/10.1155/2014/972786 DOI
27	M. Koziarski, B. Krawczyk, and M. Wozniak, "Radial-based oversampling for noisy imbalanced data classification," Neurocomputing, vol. 343, pp. 19-33, 2019. DOI
28	R. Malhotra and S. Kamal, "An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data," Neurocomputing, vol. 343, pp. 120-140, 2019. DOI
29	G. Dimic, D. Rancic, N. Macek, P. Spalevic, and V. Drasute, "Improving the prediction accuracy in blended learning environment using synthetic minority oversampling technique," Information Discovery and Delivery, vol. 47, no. 2, pp. 76-83, 2019. DOI
30	Z. Hu, R. Chiong, I. Pranata, Y. Bao, and Y. Lin, "Malicious web domain identification using online credibility and performance data by considering the class imbalance issue," Industrial Management & Data Systems, vol. 119, no. 3, pp. 676-696, 2019. DOI
31	N. Japkowicz and S. Stephen, "The class imbalance problem: a systematic study," Intelligent Data Analysis, vol. 6, no. 5, pp. 429-449, 2002. DOI
32	M. Bach, A. Werner, J. Zywiec, and W. Pluskiewicz, "The study of under- and over-sampling methods' utility in analysis of highly imbalanced data on osteoporosis," Information Sciences, vol. 384, pp. 174-190, 2017. DOI