[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2021.11.009

Semi-supervised Software Defect Prediction Model Based on Tri-training

Meng, Fanqi (School of Computer Science, Northeast Electric Power University)
Cheng, Wenying (School of Computer Science, Northeast Electric Power University)
Wang, Jingdong (School of Computer Science, Northeast Electric Power University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.15, no.11, 2021 , pp. 4028-4042 More about this Journal

Abstract

Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

Keywords

Feature Normalization; Oversampling Techniques; Software Defect Prediction; Semi-supervised Learning; Unbalanced Classification;

Citations & Related Records

Reference

1	N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of artificial intelligence research, vol. 16, pp. 321-357, Jun. 2002. DOI
2	L. N. Gong, S. J. Jiang and L. Jiang, "Research progress of software defect prediction," Journal of Software, vol. 30, no. 10, pp. 3090-3114, 2019.
3	L. Cai, Y. R. Fan, M. Yan and X. Xia, "Just-in-time software defect prediction: Literature review," Journal of Software, vol. 30, no. 5, pp. 1288-1307, 2019.
4	X. Chen, Q. Gu, W. S. Liu, S. L. Liu and C. Ni, "Survey of static software defect prediction," Journal of Software, vol. 27, no. 1, pp. 1-25, 2016. DOI
5	X. Zhang and L. M. Wang, "Semi-supervised Ensemble Learning Approach for Software Defect Prediction," Journal of Chinese Computer Systems, vol. 39, no. 10, pp. 2138-2145, 2018.
6	N. Seliya and T. M. Khoshgoftaar, "Software quality estimation with limited fault data: a semisupervised learning perspective," Software Quality Journal, vol. 15, no. 3, pp. 327-344, 2007. DOI
7	F. Thung, X. D. Le and D. Lo, "Active Semi-supervised Defect Categorization," in Proc. of 2015 IEEE 23rd International Conference on Program Comprehension, pp. 60-70, May 2015.
8	Z. W. Zhang, X. Y. Jing and T. J. Wang, "Label propagation-based semi-supervised learning for software defect prediction," Automated Software Engineering, vol. 24, no. 1, pp. 47-69, 2017. DOI
9	G. Abadi, A. Selamat and H. Fujita, "An empirical study based on semi-supervised hybrid selforganizing map for software fault prediction," Knowledge-Based Systems, vol. 74, pp. 28-39, 2015. DOI
10	Y. Jiang, M. Li and Z. H. Zhou, "Software Defect Detection with Rocus," Journal of Computer Science & Technology, vol. 26, no. 2, pp. 328-342, 2011. DOI
11	H. Lu, B. Cukic and M. Culp, "Software defect prediction using semi-supervised learning with dimension reduction," in Proc. of 27th IEEE/ACM International Conference on Automated Software Engineering, pp. 314-317, Sept 2012.
12	H. Lu, B. Cukic and M. Culp, "A Semi-supervised Approach to Software Defect Prediction," in Proc. of 2014 IEEE 38th Annual Computer Software and Applications Conference, pp. 416-425, July 2014.
13	Y. Ma, W. W. Pan, S. Z. Zhu, H. Y. Yin and J. Lou, "An improved semi-supervised learning method for software defect prediction," Journal of Intelligent & Fuzzy Systems, vol. 27, no. 5, pp. 2473-2480, Jan. 2014. DOI
14	T. J. McCabe, "A Complexity Measure," IEEE Transactions on Software Engineering, vol. SE-2, no. 4, pp. 308-320, Dec. 1976. DOI
15	M. H. Halstead, Elements of Software Science (Operating and Programming Systems Series), New York: Elsevierence, 1977.
16	M. Shepperd, Q. Song, Z. Sun and C. Mair, "Data Quality: Some Comments on the NASA Software Defect Datasets," IEEE Transactions on Software Engineering, vol. 39, no. 9, pp. 1208-1215, Sept. 2013. DOI
17	S. Feng, J. Keung, X. Yu, X. Yan and M. Zhang, "Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction," Information and Software Technology, vol. 139, June 2021.
18	Z. H. Zhou and M. Li, "Tri-training: exploiting unlabeled data using three classifiers," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 11, pp. 1529-1541, Nov. 2005. DOI
19	T. J. Wang, F. Wu and X. Y. Jing, "Semi-supervised Ensemble Learning Based Software Defect Prediction," Pattern Recognition and Artificial Intelligence, vol. 30, no. 7, pp. 646-652, 2017.
20	A. Blum and T. Mitchell, "Combining labelled and unlabeled data with co-training," in Proc. of Eleventh Conference on Computational Learning Theory, pp. 92-100, July 1998.
21	M. Li, H. Y. Zhang, R. X. Wu and Z. H. Zhou, "Sample-based software defect prediction with active and semi-supervised learning," Automated Software Engineering, vol. 19, no. 2, pp. 201-230, 2012. DOI
22	S. P. Liao, L. Xu and M. Yan, "Software defect prediction using semi-supervised support vector machine with sampling," Computer Engineering and Applications, vol. 53, no. 14, pp. 161-166, 2017.
23	Z. W. Zhang, X. Y. Jing and F. Wu, "Twice Learning Based Semi-supervised Dictionary Learning for Software Defect Prediction," Pattern Recognition and Artificial Intelligence, vol. 30, no. 3, pp. 242-250, 2017.
24	W. W. Li, W. Z. Zhang, X. Y. Jia and Z. Q. Huang, "Effort-Aware Semi-Supervised Just-inTime Defect Prediction," Information and Software Technology, vol. 126, 2020.
25	N. Seliya and T. M. Khoshgoftaar, "Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering," IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, vol. 37, no. 2, pp. 201-211, March 2007. DOI
26	C. Catal and B. Diri, "Unlabeled extra data do not always mean extra performance for semisupervised fault prediction," Expert Systems, vol. 26, no. 5, pp. 458-471, 2009. DOI
27	H. Lu, B. Cukic and M. Culp, "An iterative semi-supervised approach to software fault prediction," in Proc. of the 7th International Conference on Predictive Models in Software Engineering, pp. 1-10, 2011.