DOI QR코드

DOI QR Code

Cross-Project Pooling of Defects for Handling Class Imbalance

  • Catherine, J.M. (CTTE College for Women) ;
  • Djodilatchoumy, S (Chellammal College for Women)
  • Received : 2022.10.05
  • Published : 2022.10.30

Abstract

Applying predictive analytics to predict software defects has improved the overall quality and decreased maintenance costs. Many supervised and unsupervised learning algorithms have been used for defect prediction on publicly available datasets. Most of these datasets suffer from an imbalance in the output classes. We study the impact of class imbalance in the defect datasets on the efficiency of the defect prediction model and propose a CPP method for handling imbalances in the dataset. The performance of the methods is evaluated using measures like Matthew's Correlation Coefficient (MCC), Recall, and Accuracy measures. The proposed sampling technique shows significant improvement in the efficiency of the classifier in predicting defects.

Keywords

References

  1. Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. 2004. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6, 1 (June 2004), 20-29. https://doi.org/10.1145/1007730.1007735
  2. D. Chicco, V. Starovoitov and G. Jurman, "The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment," in IEEE Access, vol. 9, pp. 47112-47124, 2021, DOI: 10.1109/ACCESS.2021.3068614.
  3. Farhad Soleimanian Gharehchopogh, Mohammad Namazi, Laya Ebrahimi, Benyamin Abdollahzadeh. "Advances in Sparrow Search Algorithm: A Comprehensive Survey" , Archives of Computational Methods in Engineering, 2022
  4. G. Boetticher, T. Menzies, and T. J. Ostrand, (2007) Promise repository of empirical software engineering data. [Online]. Available: http://promise.site.uottawa.ca/SERepository.
  5. H. He; X. Zhang; Q. Wang; J. Ren; J. Liu; X. Zhao; Y. Cheng, Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data", IEEE Access, 2019, Vol 7 pp 110333-110343. https://doi.org/10.1109/ACCESS.2019.2934128
  6. Ishani Aroraa, Vivek Tetarwala, Anju Saha, "Open Issues in Software Defect Prediction", Procedia Computer Science 46 (2015) 906 - 912 https://doi.org/10.1016/j.procs.2015.02.161
  7. K. E. Bennin, J. Keung, P. Phannachitta, A. Monden and S. Mensah, "MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction," 2018 IEEE Transactions on Software Engineering, Vol 44, Issue 6, pp. 534-550. https://doi.org/10.1109/TSE.2017.2731766
  8. Lee, W., Jun, C.-H., Lee, J.-S.: Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification. Inf. Sci. 381, 92-103 (2017) https://doi.org/10.1016/j.ins.2016.11.014
  9. L. Gong; S. Jiang; L. Jiang, "Tackling Class Imbalance Problem in Software Defect Prediction Through Cluster-Based Oversampling with Filtering", IEEE Access, 2019, Vol 7, pp 145725-145737. https://doi.org/10.1109/ACCESS.2019.2945858
  10. Ostertagova, Eva & Ostertag, Oskar. (2013). Methodology and Application of One-way ANOVA. American Journal of Mechanical Engineering. 1. 256-261. 10.12691/ajme-1-7-21.
  11. X. Jing; F. Wu; X. Dong; B. Xu, "An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems," IEEE Transactions on Software Engineering, Vol 43, Issue 4, pp 321-339.