Browse > Article
http://dx.doi.org/10.3745/KTSDE.2021.10.9.339

Bayesian Optimization Framework for Improved Cross-Version Defect Prediction  

Choi, Jeongwhan (연세대학교 인공지능학과)
Ryu, Duksan (전북대학교 소프트웨어공학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.10, no.9, 2021 , pp. 339-348 More about this Journal
Abstract
In recent software defect prediction research, defect prediction between cross projects and cross-version projects are actively studied. Cross-version defect prediction studies assume WP(Within-Project) so far. However, in the CV(Cross-Version) environment, the previous work does not consider the distribution difference between project versions is important. In this study, we propose an automated Bayesian optimization framework that considers distribution differences between different versions. Through this, it automatically selects whether to perform transfer learning according to the difference in distribution. This framework is a technique that optimizes the distribution difference between versions, transfer learning, and hyper-parameters of the classifier. We confirmed that the method of automatically selecting whether to perform transfer learning based on the distribution difference is effective through experiments. Moreover, we can see that using our optimization framework is effective in improving performance and, as a result, can reduce software inspection effort. This is expected to support practical quality assurance activities for new version projects in a cross-version project environment.
Keywords
Software Defect Prediction; Bayesian Optimization; Transfer Learning; Cross-Version Defect Prediction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 N. Nagappan, T. Ball, and A. Zeller, "Mining metrics to predict component failures," in Proceeding of the 28th International Conference on Software Engineering - ICSE '06, pp.452, 2006.
2 Z. Li, X. Y. Jing, and X. Zhu, "Progress on approaches to software defect prediction," IET Software, Vol.12, No.3, pp.161-175, 2018.   DOI
3 C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, "The impact of automated parameter optimization on defect prediction models," IEEE Transaction on Software Engineering, Vol.45, No.7, pp.683-711, Jul. 2019.   DOI
4 J. Choi, and D. Ryu, "Bayesian optimization framework for cross-version defect prediction," in Proceedings of the 23rd Korea Conference on Software Engineering, pp.63-72, 2021.
5 C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, "Automated parameter optimization of classification techniques for defect prediction models," in Proceedings of the International Conference on Software Engineering, pp.321-332, May 2016.
6 S. Amasaki, "Cross-version defect prediction: Use historical data, cross-project data, or both?," Empirical Software Engineering, Vol.25, No.2, pp.1573-1595, Mar. 2020.   DOI
7 X. Yang and W. Wen, "Ridge and lasso regression models for cross-version defect prediction," IEEE Transaction on Reliability, Vol.67, No.3, pp.885-896, Sep. 2018.   DOI
8 M. Jureczko and L. Madeyski, "Towards identifying software project clusters with regard to defect prediction," in Proceedings of the International Conference on Predictive Models in Software Engineering - PROMISE '10, 2010.
9 K. Li, Z. Xiang, T. Chen, S. Wang, and K. C. Tan, "Understanding the automated parameter optimization on transfer learning for CPDP: An empirical study," in Proceeding of the International Conference on Software Engineering, 2020.
10 B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, "On the relative value of cross-company and within-company data for defect prediction," Empirical Software Engineering, Vol.14, No.5, pp.540-578, 2009.   DOI
11 J. Bergstra, D. Yamins, and D. Cox, "Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms," in Proceedings of the 12th Python in Science Conference (SCIPY 2013), pp.13-19, 2013.
12 Y. Shin, A. Meneely, L. Williams, and J. A. Osborne, "Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities," IEEE Transaction on Software Engineering, Vol.37, No.6, pp.772-787, 2011.   DOI
13 S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, "Domain adaptation via transfer component analysis," IEEE Transaction on Neural Networks, Vol.22, No.2, pp.199-210, 2011.   DOI
14 J. Nam, S. J. Pan, and S. Kim, "Transfer defect learning," in Proceedings - International Conference on Software Engineering, 2013.
15 B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, "On the relative value of cross-company and within-company data for defect prediction," Empirical Software Engineering, 2009.
16 S. Amasaki, "Cross-version defect prediction using crossproject defect prediction approaches," in Proceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering, pp.32-41, 2018.
17 K. Li, Z. Xiang, T. Chen, and K. C. Tan, "BiLO-CPDP: Bi-Level programming for automated model discovery in cross-project defect prediction," Automated Software Engineering, pp.1-24, Aug. 2020.
18 S. Amasaki, K. Kawata, and T. Yokogawa, "Improving cross-project defect prediction methods with data simplification," Proceeding of the 41st Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2015, pp.96-103, 2015.