DOI QR코드

DOI QR Code

Severity-based Software Quality Prediction using Class Imbalanced Data

  • Hong, Euy-Seok (School of Information Technology, Sungshin Women's University) ;
  • Park, Mi-Kyeong (Dept. of Computer Science, Sungshin Women's University)
  • Received : 2016.04.06
  • Accepted : 2016.04.27
  • Published : 2016.04.29

Abstract

Most fault prediction models have class imbalance problems because training data usually contains much more non-fault class modules than fault class ones. This imbalanced distribution makes it difficult for the models to learn the minor class module data. Data imbalance is much higher when severity-based fault prediction is used. This is because high severity fault modules is a smaller subset of the fault modules. In this paper, we propose severity-based models to solve these problems using the three sampling methods, Resample, SpreadSubSample and SMOTE. Empirical results show that Resample method has typical over-fit problems, and SpreadSubSample method cannot enhance the prediction performance of the models. Unlike two methods, SMOTE method shows good performance in terms of AUC and FNR values. Especially J48 decision tree model using SMOTE outperforms other prediction models.

Keywords

References

  1. C. Catal, "Software fault prediction: A literature review and current trends," Expert Systems with Applications, Vol.38, No.4, pp.4626-4636, April 2011. https://doi.org/10.1016/j.eswa.2010.10.024
  2. R. Malhotra, "A systematic review of machine learning techniques for software fault prediction," Applied Soft. Computing Vol.27, pp.504-518, Feb. 2015. https://doi.org/10.1016/j.asoc.2014.11.023
  3. D. E. Harter, C. F. Kemerer and S. A. Slaughter, "Does Software Process Improvement Reduce the Severity of Defects? A Longitudinal Field Study," IEEE Trans. Software Eng., Vol.38, No.4, pp. 810-827, July 2012. https://doi.org/10.1109/TSE.2011.63
  4. Y. Zhou and H. Leung, "Empirical analysis of object-oriented design metrics for predicting high and low severity faults," IEEE Trans. Software Eng., Vol.32, No.10, pp.771-789, Oct. 2006. https://doi.org/10.1109/TSE.2006.102
  5. E. S. Hong, "Software Quality Prediction based on Defect Severity," Journal of the Korea Society of Computer and Information, Vol.20, No.5, pp. 73-81, May 2015.
  6. E. S. Hong, "Ambiguity Analysis of Defectiveness in NASA MDP data sets," Journal of the Korea Society of IT Services, Vol.12, No.2, pp.361-371, June 2013. https://doi.org/10.9716/KITS.2013.12.2.361
  7. E. S. Hong and M. K. Park, "Unsupervised learning model for fault prediction using representative clustering algorithms," KIPS Trans. Software and Data Engineering, Vol.3, No.2, pp.57-64, Feb. 2014. https://doi.org/10.3745/KTSDE.2014.3.2.57
  8. Y. Zhou and H. Leung, "Empirical analysis of object-oriented design metrics for predicting high and low severity faults," IEEE Trans. Software Eng., Vol.32, No.10, pp.771-789, Oct. 2006. https://doi.org/10.1109/TSE.2006.102
  9. Y. Singh, A. Kaur and R. Malhotra, "Empirical validation of object-oriented metrics for predicting fault proneness models," Software Quality Journal, Vol.18, pp.3-35, March 2010. https://doi.org/10.1007/s11219-009-9079-6
  10. Y. Kamei, A. Moden, S. Matsumoto, T. Kakimoto and K. Matsumoto, "The Effects of Over and Under Sampling on Fault-prone Module Detection," proc. ESEM, pp.196-204, 2007.
  11. Y. Jiang, M. Li and Z. Zhou, "Software defect detection with ROCUS," Journal of Computer Science and Technology, Vol.26, No.2, pp.328-342, March 2011. https://doi.org/10.1007/s11390-011-9439-0
  12. M. Li, H. Zhang, R. Wu and Z. H. Zhou, "Sample based software defect prediction with active and semi-supervised learning," Automated Software Engineering, Vol.19, No.2, pp.201-230, June 2012. https://doi.org/10.1007/s10515-011-0092-1
  13. S. Wang and X. Yao, "Using class imbalance learning for software defect prediction," IEEE Trans. Reliability, Vol.62, No.2, pp.434-443, June 2013. https://doi.org/10.1109/TR.2013.2259203
  14. WEKA (Waikato Environment for Knowledge Analysis) http://www.cs.waikato.ac.nz/-ml/weka/
  15. T. Fawcett, "An introduction to ROC analysis," Pattern recognition letters, Vol.27, No.8, pp.861-874, June 2006. https://doi.org/10.1016/j.patrec.2005.10.010
  16. N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, Vol.16, No.1, pp.321-357, Jan. 2002.
  17. L. Rokach and O. Maimon, "Top-Down Induction of Decision Trees Classifiers - A Survey," IEEE Trans. Systems, Man, and Cybernetics, Part C, Vol.35, No.4, pp. 476-487, Nov. 2005. https://doi.org/10.1109/TSMCC.2004.843247