Handling Method of Imbalance Data for Machine Learning : Focused on Sampling |
Lee, Kyunam
(충북대학교 빅데이터학과)
Lim, Jongtae (충북대학교 정보통신공학과) Bok, Kyoungsoo (원광대학교 SW융합학과) Yoo, Jaesoo (충북대학교 정보통신공학과) |
1 | Rushi Longadge, Snehlata S. Dongre, and Latesh Malik, "Class imbalance problem in data mining review," Internation Journal of Computer Science and Network, Vol.2, No.1, pp.1-6, 2013. |
2 | Joffrey L. Leevy, Taghi M. Khoshgoftaar, Richard A. Bauder, and Naeem Seliya, "A survey on addressing high-class imbalance in big data," Journal of Big Data, Vol.5, No.1, pp.1-30, 2018. DOI |
3 | Zhaohui Zheng, Xiaoyun Wu, and Rohini Srihari, "Feature selection for text categorization on imbalanced data," ACM Sigkdd Explorations Newsletter, Vol.6, No.1, pp.80-89, 2004. DOI |
4 | Peng Cao, Dazhe Zhao, and Osmar Zaiane, "An optimized cost-sensitive SVM for imbalanced data learning," Proc. Pacific-Asia conference on knowledge discovery and data mining, pp.280-292, 2013. |
5 | Peng Cao, Dazhe Zhao, and Osmar R. Zaiane, "A PSO-based cost-sensitive neural network for imbalanced data classification," Proc. Pacific-Asia conference on knowledge discovery and data mining, pp.452-463, 2013. |
6 | Alberto Fernandeza, Salvador Garcia, Maria Jose del Jesus, and Francisco Herrera, "A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets," Fuzzy Sets and Systems, Vol.159, No.18, pp.2378-2398, 2008. DOI |
7 | S. Picek, A. Heuser, A. Jovic, S. Bhasin, and F. Regazzoni, "The curse of class imbalance and conflicting metrics with machine learning for side-channel evaluations," 2018. |
8 | Z. Chen, Q. Yan, H. Han, S. Wang, L. Peng, L. Wang, and B. Yang, "Machine learning based mobile malware detection using highly imbalanced network traffic," Information Sciences, Vol.433, pp.346-364, 2018. DOI |
9 | I. Tomek, "An experiment with the edited nearest-neighbor rule," IEEE Transactions on systems, Man, and Cybernetics, Vol.6, No.6, pp.448-452, 1976. DOI |
10 | Dennis L. Wilson, "Asymptotic properties of nearest neighbor rules using edited data," IEEE Transactions on Systems, Man, and Cybernetics, Vol.3, pp.408-421, 1972. DOI |
11 | I. Tomek, "Two Modifications of CNN," IEEE Transactions on Systems, Man and Cybernetics, Vol.6, No.11, pp.769-772, 1976. DOI |
12 | Kubat, Miroslav, and Stan Matwin, "Addressing the curse of imbalanced training sets: one-sided selection," Proc. International Conference on Machine Learning, Vol.97, pp.179-186, 1997. |
13 | J. Laurikkala, "Improving identification of difficult small classes by balancing class distribution," Proc. Conference on Artificial Intelligence in Medicine in Europe - Artificial Intelligence in Medicine, pp.63-66, 2001. |
14 | Mani, Inderjeet and I. Zhang, "kNN approach to unbalanced data distributions: a case study involving information extraction," Proc. workshop on learning from imbalanced datasets, Vol.126, 2003. |
15 | N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of artificial intelligence research, Vol.16, No.1, pp.321-357, 2002. DOI |
16 | H. He, Y. Bai, E. A. Garcia, and S. Li, "ADASYN: Adaptive synthetic sampling approach for imbalanced learning," Proc. IEEE International Joint Conference on Neural Networks, pp.1322-1328, 2008. |
17 | Arpit Singh and Anuradha Purohit, "A survey on methods for solving data imbalance problem for classification," International Journal of Computer Applications, Vol.127, No.15, pp.37-41, 2015. DOI |
18 | Batista, Gustavo EAPA, Ana LC Bazzan, and Maria Carolina Monard, "Balancing Training Data for Automated Annotation of Keywords: a Case Study," Proc. Workshop on Bioinformatics, 2003. |
19 | Shaza M. Abd Elrahman and Ajith Abraham, "A review of class imbalance problem," Journal of Network and Innovative Computing, Vol.1, pp.332-340, 2013. |
20 | Haibo He and Edwardo A. Garcia, "Learning from imbalanced data," IEEE Transactions on Knowledge & Data Engineering, Vol.21, No.9, pp.1263-1284, 2009. DOI |
21 | https://sci2s.ugr.es/keel/imbalanced.php?order=ir#sub10, 2019.8.18. |
22 | Batista, Gustavo EAPA, Ronaldo C. Prati and Maria Carolina Monard, "A study of the behavior of several methods for balancing machine learning training data," SIGKDD Explorations, Vol.6, No.1, pp.20-29, 2004. DOI |