DOI QR코드

DOI QR Code

Bayesian Optimization Framework for Improved Cross-Version Defect Prediction

향상된 교차 버전 결함 예측을 위한 베이지안 최적화 프레임워크

  • 최정환 (연세대학교 인공지능학과) ;
  • 류덕산 (전북대학교 소프트웨어공학과)
  • Received : 2021.05.15
  • Accepted : 2021.07.08
  • Published : 2021.09.30

Abstract

In recent software defect prediction research, defect prediction between cross projects and cross-version projects are actively studied. Cross-version defect prediction studies assume WP(Within-Project) so far. However, in the CV(Cross-Version) environment, the previous work does not consider the distribution difference between project versions is important. In this study, we propose an automated Bayesian optimization framework that considers distribution differences between different versions. Through this, it automatically selects whether to perform transfer learning according to the difference in distribution. This framework is a technique that optimizes the distribution difference between versions, transfer learning, and hyper-parameters of the classifier. We confirmed that the method of automatically selecting whether to perform transfer learning based on the distribution difference is effective through experiments. Moreover, we can see that using our optimization framework is effective in improving performance and, as a result, can reduce software inspection effort. This is expected to support practical quality assurance activities for new version projects in a cross-version project environment.

최근 소프트웨어 결함 예측 연구는 교차 프로젝트 간의 결함 예측뿐만 아니라 교차 버전 프로젝트 간의 결함 예측 또한 이루어지고 있다. 종래의 교차 버전 결함 예측 연구들은 WP(Within-Project)로 가정한다. 하지만, CV(Cross-Version) 환경에서는 프로젝트 버전 간의 분포 차이의 중요성을 고려한 연구들이 없다. 본 연구에서는 다른 버전 간의 분포 차이까지 고려하는 자동화된 베이지안 최적화 프레임워크를 제안한다. 이를 통해 분포차이에 따라 전이 학습(Transfer Learning) 수행 여부를 자동으로 선택하여 준다. 해당 프레임워크는 버전 간의 분포 차이, 전이 학습과 분류기(Classifier)의 하이퍼파라미터를 최적화하는 기법이다. 실험을 통해 전이 학습 수행 여부를 분포차 기준으로 자동으로 선택하는 방법이 효과적이라는 것을 알 수 있다. 그리고 최적화를 이용하는 것이 성능 향상에 효과가 있으며 이러한 결과 소프트웨어 인스펙션 노력을 감소할 수 있다는 것을 확인할 수 있다. 이를 통해 교차 버전 프로젝트 환경에서 신규 버전 프로젝트에 대하여 효과적인 품질 보증 활동 수행을 지원할 것으로 기대된다.

Keywords

Acknowledgement

본 연구는 원자력안전위원회의 재원으로 한국원자력안전재단의 지원을 받아 수행한 원자력안전연구사업(No. 2105030)과 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원(NRF-2019R1G1A1005047)을 받아 수행된 연구사업의 결과임.

References

  1. C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, "The impact of automated parameter optimization on defect prediction models," IEEE Transaction on Software Engineering, Vol.45, No.7, pp.683-711, Jul. 2019. https://doi.org/10.1109/tse.2018.2794977
  2. C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, "Automated parameter optimization of classification techniques for defect prediction models," in Proceedings of the International Conference on Software Engineering, pp.321-332, May 2016.
  3. S. Amasaki, "Cross-version defect prediction: Use historical data, cross-project data, or both?," Empirical Software Engineering, Vol.25, No.2, pp.1573-1595, Mar. 2020. https://doi.org/10.1007/s10664-019-09777-8
  4. S. Amasaki, "Cross-version defect prediction using crossproject defect prediction approaches," in Proceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering, pp.32-41, 2018.
  5. X. Yang and W. Wen, "Ridge and lasso regression models for cross-version defect prediction," IEEE Transaction on Reliability, Vol.67, No.3, pp.885-896, Sep. 2018. https://doi.org/10.1109/TR.2018.2847353
  6. K. Li, Z. Xiang, T. Chen, S. Wang, and K. C. Tan, "Understanding the automated parameter optimization on transfer learning for CPDP: An empirical study," in Proceeding of the International Conference on Software Engineering, 2020.
  7. K. Li, Z. Xiang, T. Chen, and K. C. Tan, "BiLO-CPDP: Bi-Level programming for automated model discovery in cross-project defect prediction," Automated Software Engineering, pp.1-24, Aug. 2020.
  8. N. Nagappan, T. Ball, and A. Zeller, "Mining metrics to predict component failures," in Proceeding of the 28th International Conference on Software Engineering - ICSE '06, pp.452, 2006.
  9. B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, "On the relative value of cross-company and within-company data for defect prediction," Empirical Software Engineering, Vol.14, No.5, pp.540-578, 2009. https://doi.org/10.1007/s10664-008-9103-7
  10. S. Amasaki, K. Kawata, and T. Yokogawa, "Improving cross-project defect prediction methods with data simplification," Proceeding of the 41st Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2015, pp.96-103, 2015.
  11. M. Jureczko and L. Madeyski, "Towards identifying software project clusters with regard to defect prediction," in Proceedings of the International Conference on Predictive Models in Software Engineering - PROMISE '10, 2010.
  12. J. Bergstra, D. Yamins, and D. Cox, "Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms," in Proceedings of the 12th Python in Science Conference (SCIPY 2013), pp.13-19, 2013.
  13. Z. Li, X. Y. Jing, and X. Zhu, "Progress on approaches to software defect prediction," IET Software, Vol.12, No.3, pp.161-175, 2018. https://doi.org/10.1049/iet-sen.2017.0148
  14. Y. Shin, A. Meneely, L. Williams, and J. A. Osborne, "Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities," IEEE Transaction on Software Engineering, Vol.37, No.6, pp.772-787, 2011. https://doi.org/10.1109/TSE.2010.81
  15. J. Choi, and D. Ryu, "Bayesian optimization framework for cross-version defect prediction," in Proceedings of the 23rd Korea Conference on Software Engineering, pp.63-72, 2021.
  16. B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, "On the relative value of cross-company and within-company data for defect prediction," Empirical Software Engineering, 2009.
  17. J. Nam, S. J. Pan, and S. Kim, "Transfer defect learning," in Proceedings - International Conference on Software Engineering, 2013.
  18. S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, "Domain adaptation via transfer component analysis," IEEE Transaction on Neural Networks, Vol.22, No.2, pp.199-210, 2011. https://doi.org/10.1109/TNN.2010.2091281