DOI QR코드

DOI QR Code

Centroid and Nearest Neighbor based Class Imbalance Reduction with Relevant Feature Selection using Ant Colony Optimization for Software Defect Prediction

  • B., Kiran Kumar (Department of Information Technology, Kakatiya Institute of Technology & Science) ;
  • Gyani, Jayadev (Department of CS, College of Computer and Information Sciences, Majmaah University) ;
  • Y., Bhavani (Department of Information Technology, Kakatiya Institute of Technology & Science) ;
  • P., Ganesh Reddy (Department of Information Technology, Kakatiya Institute of Technology & Science) ;
  • T, Nagasai Anjani Kumar (Department of Information Technology, Kakatiya Institute of Technology & Science)
  • Received : 2022.10.05
  • Published : 2022.10.30

Abstract

Nowadays software defect prediction (SDP) is most active research going on in software engineering. Early detection of defects lowers the cost of the software and also improves reliability. Machine learning techniques are widely used to create SDP models based on programming measures. The majority of defect prediction models in the literature have problems with class imbalance and high dimensionality. In this paper, we proposed Centroid and Nearest Neighbor based Class Imbalance Reduction (CNNCIR) technique that considers dataset distribution characteristics to generate symmetry between defective and non-defective records in imbalanced datasets. The proposed approach is compared with SMOTE (Synthetic Minority Oversampling Technique). The high-dimensionality problem is addressed using Ant Colony Optimization (ACO) technique by choosing relevant features. We used nine different classifiers to analyze six open-source software defect datasets from the PROMISE repository and seven performance measures are used to evaluate them. The results of the proposed CNNCIR method with ACO based feature selection reveals that it outperforms SMOTE in the majority of cases.

Keywords

Acknowledgement

The authors would like to thank Deanship of Scientific Research at Majmaah University for supporting this work under Project Number No. xxxx. The author is also thankful to the anonymous reviewers for their useful comments.

References

  1. RamanaRao, GNV., Balaram, VVSSS. & Vishnuvardhan, B. (2018) Software defect prediction: past present and future. International Journal of Computer Engineering & Technology (IJCET), 9(5):116-131.
  2. Somya, G. (2021) Handling class-imbalance with KNN (Neighborhood) under-sampling for software defect prediction. Artificial Intelligence Review: 1-42
  3. Shang, Zheng., Jinjing, Gai., Hualong, Yu., Haitao Zou. & Shang, Gao. (2021) Training data selection for imbalanced cross-project defect prediction. Computers & Electrical Engineering, 94
  4. Sushant Kumar, Pandey. & Anil Kumar, Tripathi. (2021) An empirical study toward dealing with noise and class imbalance issues in software defect prediction. Soft Computing, 25: 13465-13492 https://doi.org/10.1007/s00500-021-06096-3
  5. Mohammad AmimulIhsan, Aquil. & Wan Hussain, Wan Ishak. (2020) Predicting software defects using machine learning techniques. International Journal of Advanced Trends in Computer Science and Engineering, 9(4): 6609 - 6616 https://doi.org/10.30534/ijatcse/2020/352942020
  6. Asad, Ali. & Gravino, Carmine. (2021) Software fault prediction using bio-inspired algorithms to select the features to be employed: an empirical study. 29thInternational Conference on Information Systems Development
  7. Harzevili., Shiri, Nima. & Alizadeh, Sasan H. (2021) Analysis and modeling conditional mutual de- pendency of metrics in software defect prediction using latent variables. Neuro Computing 460:309-330
  8. SrinivasaKumar, C., RangaSwamy, Sirisati. & Srinivasulu, Thonukunuri. (2021) Software defect prediction using optimized cuckoo search based nature-inspired technique. Smart Computing Techniques and Applications. Springer: 183-192.
  9. Abdullateef, Balogun., Fatimah B Lafenwa, Balogun., Hammed, Mojeed. & Fatima Enehezei Hamza, Usman. (2020) Data sampling-based feature selection framework for software defect prediction. The International Conference on Emerging Applications and Technologies for Industry 4.0. Springer
  10. Haitao, Xu., Ruifeng, Duan., Shengsong, Yang. & Lei, Guo. (2021) An empirical study on data sampling for just-in-time defect prediction. International Conference on Artificial Intelligence and Security. Springer.
  11. Faseeha, Matloob., Taher, M, Ghazal., Nasser, Taleb., Shabib, Aftab., Munir, Ahmad. & Muham- mad, Adnan Khan. (2021) Software defect prediction using ensemble learning: a systematic literature review. IEEE Access, 9: 98754-98771 https://doi.org/10.1109/ACCESS.2021.3095559
  12. Shubhra, Goyal Jindal. & Arvinder, Kaur. (2019) Bug severity prediction using class imbalance problem. International Journal of Recent Technology and Engineering (IJRTE), 8(4): 2687-2695 https://doi.org/10.35940/ijrte.D7297.118419
  13. Kalaivani, N. & Beena, R. (2020) Boosted relief feature subset selection and heterogeneous cross project defect prediction using firefly particle swarm optimization. International Journal of Recent Technology and Engineering (IJRTE), 8(5): 2605-2613 https://doi.org/10.35940/ijrte.E6333.018520
  14. Jayalath, Ekanayake. (2021) Bug severity prediction using keywords in imbalanced learning environment. International Journal of Information Technology and Computer Science, 3:53-60 https://doi.org/10.5815/ijitcs.2021.03.04
  15. Faseeha, Matloob., Shabib, Aftab., Munir, Ahmad., Adnan Khan, Muhammad., Fatima, Areej., Iqbal, Muhammad., Alruwaili, Wesam Mohsen. & Elmitwally, NouhSabri. (2021) Software defect prediction using supervised machine learning techniques: a systematic literature review. Intelligent Automation & Soft Computing, 29(2): 403-421
  16. Shuo, Feng., Jacky, Keung., Xiao, Yu., Yan, Xiao. & Miao, Zhang. (2021) Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction. Information and Software Technology, 139
  17. Kun, Zhu., Shi, Ying., Nana, Zhang. & DandanZhun. (2021) Software defect prediction based on enhanced metaheuristic feature selection optimization and a hybrid deep neural network, Journal of Systems and Software, 180
  18. Ha, Th Minh Phuong., Le Thi My Hanh. & Nguyen Thanh, Binh. (2021) A Comparative Analysis of Filter-Based Feature Selection Methods, Journal of Research and Development on Information and Communication Technology, 1(6):1-7
  19. Guo, Shikai., Dong, Jian., Li, Hui. & Wang, Jiahui. (2021) Software defect prediction with imbalanced distribution by radius-synthetic minority over-sampling technique. Journal of Software: Evolution and Process 33(1)
  20. Ramesh, Ponnala. & Reddy, CRK. (2021) Software defect prediction using machine learning algorithms: current state of the art. Solid State Technology. 64(2)
  21. Rahul, Yedida. & Menzies, Tim. (2021) On the value of oversampling for deep learning in software defect prediction. IEEE Transactions on Software Engineering. 2021:1-11
  22. Inderpreet, Kaur. & Arvinder, Kaur. (2021) Comparative analysis of software fault prediction using various categories of classifiers. International Journal of System Assurance Engineering and Management 12(1):520-535
  23. Xu, Xiaolong., Chen, Wen. & Wang, Xinheng. (2021) RFC: A feature selection algorithm for software defect prediction. Journal of Systems Engineering and Electronics. 32(2): 389-398 https://doi.org/10.23919/JSEE.2021.000032
  24. Zheng, Jianming., Wang, Xingqi., Wei, Dan., Chen, Bin. & Shao, Yanli. (2021) A novel imbalanced ensemble learning in software defect predication. IEEE Access 9:86855-86868 https://doi.org/10.1109/ACCESS.2021.3072682
  25. Amit,Singh., Ranjeet Kumar, Ranjan. & Abhishek, Tiwari. (2021) Credit card fraud detection under extreme imbalanced data: a comparative study of data-level algorithms. Journal of Experimental & Theoretical Artificial Intelligence. 1-28
  26. Ebiaredoh-Mienye, Sarah., Esenogho, Ebenezer. & Swart, Theo. (2021) Improved machine learning methods for classification of imbalanced data
  27. Satya Srinivas, Maddipati. & Srinivas, Malladi. (2021) Machine learning approach for classification from imbalanced software defect data using PCA & CSANFIS. Materials Today: Proceedings
  28. ZYuqing, Zhang., Xuefeng, Yan. & Arif Ali, Khan. (2020) A kernel density estimation-based variation sampling for class imbalance in defect prediction. IEEE International Conference on Big Data and Cloud Computing
  29. Jiang, Z., Pan, T., Zhang, C. & Yang, J. (2021) A new oversampling method based on the classification contribution degree. Symmetry 13(2):1-13
  30. Mahesh Kumar, Thota., Francis H, Shajin. & Rajesh, P. (2020) Survey on software defect prediction techniques. International Journal of Applied Science and Engineering 17(4): 331-344