DOI QR코드

DOI QR Code

Effective Harmony Search-Based Optimization of Cost-Sensitive Boosting for Improving the Performance of Cross-Project Defect Prediction

교차 프로젝트 결함 예측 성능 향상을 위한 효과적인 하모니 검색 기반 비용 민감 부스팅 최적화

  • Received : 2017.10.17
  • Accepted : 2017.12.09
  • Published : 2018.03.31

Abstract

Software Defect Prediction (SDP) is a field of study that identifies defective modules. With insufficient local data, a company can exploit Cross-Project Defect Prediction (CPDP), a way to build a classifier using dataset collected from other companies. Most machine learning algorithms for SDP have used more than one parameter that significantly affects prediction performance depending on different values. The objective of this study is to propose a parameter selection technique to enhance the performance of CPDP. Using a Harmony Search algorithm (HS), our approach tunes parameters of cost-sensitive boosting, a method to tackle class imbalance causing the difficulty of prediction. According to distributional characteristics, parameter ranges and constraint rules between parameters are defined and applied to HS. The proposed approach is compared with three CPDP methods and a Within-Project Defect Prediction (WPDP) method over fifteen target projects. The experimental results indicate that the proposed model outperforms the other CPDP methods in the context of class imbalance. Unlike the previous researches showing high probability of false alarm or low probability of detection, our approach provides acceptable high PD and low PF while providing high overall performance. It also provides similar performance compared with WPDP.

소프트웨어 결함 예측(SDP)은 결함이 있는 모듈을 식별하기 위한 연구 분야이다. 충분한 로컬 데이터가 없으면 다른 회사에서 수집한 데이터를 사용하여 분류기를 구축하는 교차 프로젝트 결함 예측(CPDP)을 활용할 수 있다. SDP에 대한 대부분의 기계 학습 알고리즘은 서로 다른 값에 따라 예측 성능에 큰 영향을 미치는 하나 이상의 매개 변수를 사용한다. 본 연구의 목적은 CPDP의 예측 성능 향상을 위해 매개 변수 선택 기법을 제안하는 것이다. Harmony Search 알고리즘을 사용하여, 예측 어려움을 야기하는 클래스 불균형을 해결하는 방법인 비용에 민감한 부스팅의 매개 변수를 조정한다. 분포 특성에 따라 매개 변수 범위와 매개 변수 간의 제한 조건 규칙이 정의되어 하모니 검색 알고리즘에 적용된다. 제안된 접근법은 15개의 대상 프로젝트를 대상으로 3개의 CPDP 모델과 내부프로젝트 결함 예측(WPDP) 모델을 비교한다. 실험 결과는 제안된 방법이 클래스 불균형의 맥락에서 다른 CPDP 방법보다 성능이 우수하다는 것을 보여준다. 이전의 연구에서는 탐지 확률이 낮거나 오보 가능성이 높았으나 우리의 기법은 높은 PD와 낮은 PF를 제공하면서 높은 전체 성능을 보였다. 또한 WPDP와 비슷한 성능을 제공하였다.

Keywords

References

  1. H. He and E. A. Garcia, "Learning from imbalanced data," IEEE Trans. Knowl. Data Eng., Vol.21, No.9, pp.1263-1284, Sep., 2009. https://doi.org/10.1109/TKDE.2008.239
  2. D. Ryu, J.-I. Jang, and J. Baik, "A transfer cost-sensitive boosting approach for cross-project defect prediction," Softw. Qual. J., pp.1-38, 2015.
  3. Z. Geem, J. Kim, and G. Loganathan, "A new heuristic optimization algorithm: harmony search," Simulation, Vol.76, No.2, pp.60-68, 2001. https://doi.org/10.1177/003754970107600201
  4. M. Jureczko and L. Madeyski, "Towards identifying software project clusters with regard to defect prediction," Proc. 6th Int. Conf. Predict. Model. Softw. Eng. - PROMISE '10, p. 1, 2010.
  5. T. Menzies, B. Caglayan, Z. He, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan, "The PROMISE Repository of empirical software engineering data," 2012. [Online]. Available: http://openscience.us/repo/.
  6. T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, "A Systematic Literature Review on Fault Prediction Performance in Software Engineering," IEEE Trans. Softw. Eng., Vol.38, No.6, pp.1276-1304, Nov., 2012. https://doi.org/10.1109/TSE.2011.103
  7. E. Arisholm, L. C. Briand, and E. B. Johannessen, "A systematic and comprehensive investigation of methods to build and evaluate fault prediction models," J. Syst. Softw., Vol.83, No.1, pp.2-17, Jan., 2010. https://doi.org/10.1016/j.jss.2009.06.055
  8. M. D'Ambros, M. Lanza, and R. Robbes, "Evaluating defect prediction approaches: A benchmark and an extensive comparison," Empir. Softw. Eng., Vol.17, No.4-5, pp.531-577, Aug., 2012. https://doi.org/10.1007/s10664-011-9173-9
  9. K. Dejaeger, "Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers," Softw. Eng. IEEE Trans., Vol.39, No.2, pp.237-257, 2013. https://doi.org/10.1109/TSE.2012.20
  10. K. O. Elish and M. O. Elish, "Predicting defect-prone software modules using support vector machines," J. Syst. Softw., Vol.81, No.5, pp.649-660, May, 2008. https://doi.org/10.1016/j.jss.2007.07.040
  11. Y. Singh, A. Kaur, and R. Malhotra, "Empirical validation of object-oriented metrics for predicting fault proneness models," Softw. Qual. J., Vol.18, No.1, pp.3-35, Jul., 2009. https://doi.org/10.1007/s11219-009-9079-6
  12. T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, "Cross-project defect prediction," in Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, 2009, pp.91-100.
  13. B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, "On the relative value of cross-company and within-company data for defect prediction," Empir. Softw. Eng., Vol.14, No.5, pp.540-578, Jan., 2009. https://doi.org/10.1007/s10664-008-9103-7
  14. Z. He, F. Shu, Y. Yang, M. Li, and Q. Wang, "An investigation on the feasibility of cross-project defect prediction," Autom. Softw. Eng., Vol.19, No.2, pp.167-199, Jul., 2011. https://doi.org/10.1007/s10515-011-0090-3
  15. Y. Ma, G. Luo, X. Zeng, and A. Chen, "Transfer learning for cross-company software defect prediction," Inf. Softw. Technol., Vol.54, No.3, pp.248-256, Mar., 2012. https://doi.org/10.1016/j.infsof.2011.09.007
  16. D. Ryu, O. Choi, and J. Baik, "Value-cognitive boosting with a support vector machine for cross-project defect prediction," Empir. Softw. Eng., Vol.21, No.1, pp.43-71, Feb., 2016. https://doi.org/10.1007/s10664-014-9346-4
  17. D. Ryu, J. Jang, and J. Baik, "A Hybrid Instance Selection using Nearest-Neighbor for Cross-Project Defect Prediction," J. Comput. Sci. Technol., Vol.30, No.5, pp.969-980, 2015. https://doi.org/10.1007/s11390-015-1575-5
  18. G. Canfora, A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella, "Defect prediction as a multiobjective optimization problem," Softw. Testing, Verif. Reliab., Vol.25, Issue 4, pp.426-459, 2015. https://doi.org/10.1002/stvr.1570
  19. D. Ryu and J. Baik, "Effective Multi-Objective Naive Bayes Learning for Cross-Project Defect Prediction," Appl. Soft Comput. J., Vol.49, pp.1062-1077, 2016. https://doi.org/10.1016/j.asoc.2016.04.009
  20. M. Harman, P. McMinn, J. De Souza, and S. Yoo, "Search based software engineering: Techniques, taxonomy, tutorial," Empir. Softw. Eng. Verif., pp.1-59, 2012.
  21. S. Merler, C. Furlanello, B. Larcher, and A. Sboner, "Tuning cost-sensitive boosting and its application to melanoma diagnosis," Mult. Classif. Syst., pp.32-42, 2001.
  22. S. Wang and X. Yao, "Using Class Imbalance Learning for Software Defect Prediction," IEEE Trans. Reliab., Vol.62, No.2, pp.434-443, Jun., 2013. https://doi.org/10.1109/TR.2013.2259203
  23. S. Wang, H. Chen, and X. Yao, "Negative correlation learning for classification ensembles," 2010 Int. Jt. Conf. Neural Networks, pp.1-8, Jul., 2010.
  24. D. Manjarres, I. Landa-Torres, S. Gil-Lopez, J. Del Ser, M.N. Bilbao, S. Salcedo-Sanz and Z.W. Geem, "A survey on applications of the harmony search algorithm," Eng. Appl. Artif. Intell., Vol.26, No.8, pp.1818-1831, Sep., 2013. https://doi.org/10.1016/j.engappai.2013.05.008
  25. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE?: Synthetic Minority Over-sampling Technique," J. Artif. Intell. Res., Vol.16, pp.321-357, 2002.
  26. I. Tomek, "Two modifications of CNN," IEEE Trans. Syst. Man Cybern., pp.769-772, 1976.
  27. L. Chen, B. Fang, Z. Shang, and Y. Tang, "Negative samples reduction in cross-company software defects prediction," Inf. Softw. Technol., Vol.62, pp.67-77, 2015. https://doi.org/10.1016/j.infsof.2015.01.014
  28. W. Fan, S. Stolfo, J. Zhang, and P. Chan, "AdaCost: misclassification cost-sensitive boosting," ICML, 1999.
  29. Y. Sun, A. Wong, and Y. Wang, "Parameter inference of costsensitive boosting algorithms," in International Conference on Machine Learning and Data Mining, 2005, pp.21-30.
  30. M. Hall, E. Frank, and G. Holmes, "The WEKA data mining software: an update," ACM SIGKDD Explor. Newsl., Vol.11, No.1, pp.10-18, 2009. https://doi.org/10.1145/1656274.1656278
  31. Z. W. Geem, "Optimal cost design of water distribution networks using harmony search," Eng. Optim., Vol.38, pp.259-277, 2006. https://doi.org/10.1080/03052150500467430
  32. Z. W. Geem, "State-of-the-Art in the Structure of Harmony Search Algorithm," in Recent Advances In Harmony Search Algorithm, Springer Berlin Heidelberg, 2010, pp.1-10.
  33. T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, "Defect prediction from static code features: current results, limitations, new approaches," Autom. Softw. Eng., Vol.17, No.4, pp.375-407, May, 2010. https://doi.org/10.1007/s10515-010-0069-5