Browse > Article
http://dx.doi.org/10.9708/jksci.2016.21.4.073

Severity-based Software Quality Prediction using Class Imbalanced Data  

Hong, Euy-Seok (School of Information Technology, Sungshin Women's University)
Park, Mi-Kyeong (Dept. of Computer Science, Sungshin Women's University)
Abstract
Most fault prediction models have class imbalance problems because training data usually contains much more non-fault class modules than fault class ones. This imbalanced distribution makes it difficult for the models to learn the minor class module data. Data imbalance is much higher when severity-based fault prediction is used. This is because high severity fault modules is a smaller subset of the fault modules. In this paper, we propose severity-based models to solve these problems using the three sampling methods, Resample, SpreadSubSample and SMOTE. Empirical results show that Resample method has typical over-fit problems, and SpreadSubSample method cannot enhance the prediction performance of the models. Unlike two methods, SMOTE method shows good performance in terms of AUC and FNR values. Especially J48 decision tree model using SMOTE outperforms other prediction models.
Keywords
Data imbalance; Fault prediction; Severity; Sampling;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 C. Catal, "Software fault prediction: A literature review and current trends," Expert Systems with Applications, Vol.38, No.4, pp.4626-4636, April 2011.   DOI
2 R. Malhotra, "A systematic review of machine learning techniques for software fault prediction," Applied Soft. Computing Vol.27, pp.504-518, Feb. 2015.   DOI
3 D. E. Harter, C. F. Kemerer and S. A. Slaughter, "Does Software Process Improvement Reduce the Severity of Defects? A Longitudinal Field Study," IEEE Trans. Software Eng., Vol.38, No.4, pp. 810-827, July 2012.   DOI
4 Y. Zhou and H. Leung, "Empirical analysis of object-oriented design metrics for predicting high and low severity faults," IEEE Trans. Software Eng., Vol.32, No.10, pp.771-789, Oct. 2006.   DOI
5 E. S. Hong, "Software Quality Prediction based on Defect Severity," Journal of the Korea Society of Computer and Information, Vol.20, No.5, pp. 73-81, May 2015.
6 E. S. Hong, "Ambiguity Analysis of Defectiveness in NASA MDP data sets," Journal of the Korea Society of IT Services, Vol.12, No.2, pp.361-371, June 2013.   DOI
7 E. S. Hong and M. K. Park, "Unsupervised learning model for fault prediction using representative clustering algorithms," KIPS Trans. Software and Data Engineering, Vol.3, No.2, pp.57-64, Feb. 2014.   DOI
8 Y. Zhou and H. Leung, "Empirical analysis of object-oriented design metrics for predicting high and low severity faults," IEEE Trans. Software Eng., Vol.32, No.10, pp.771-789, Oct. 2006.   DOI
9 Y. Singh, A. Kaur and R. Malhotra, "Empirical validation of object-oriented metrics for predicting fault proneness models," Software Quality Journal, Vol.18, pp.3-35, March 2010.   DOI
10 Y. Kamei, A. Moden, S. Matsumoto, T. Kakimoto and K. Matsumoto, "The Effects of Over and Under Sampling on Fault-prone Module Detection," proc. ESEM, pp.196-204, 2007.
11 Y. Jiang, M. Li and Z. Zhou, "Software defect detection with ROCUS," Journal of Computer Science and Technology, Vol.26, No.2, pp.328-342, March 2011.   DOI
12 M. Li, H. Zhang, R. Wu and Z. H. Zhou, "Sample based software defect prediction with active and semi-supervised learning," Automated Software Engineering, Vol.19, No.2, pp.201-230, June 2012.   DOI
13 S. Wang and X. Yao, "Using class imbalance learning for software defect prediction," IEEE Trans. Reliability, Vol.62, No.2, pp.434-443, June 2013.   DOI
14 WEKA (Waikato Environment for Knowledge Analysis) http://www.cs.waikato.ac.nz/-ml/weka/
15 T. Fawcett, "An introduction to ROC analysis," Pattern recognition letters, Vol.27, No.8, pp.861-874, June 2006.   DOI
16 L. Rokach and O. Maimon, "Top-Down Induction of Decision Trees Classifiers - A Survey," IEEE Trans. Systems, Man, and Cybernetics, Part C, Vol.35, No.4, pp. 476-487, Nov. 2005.   DOI
17 N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, Vol.16, No.1, pp.321-357, Jan. 2002.