Effective Harmony Search-Based Optimization of Cost-Sensitive Boosting for Improving the Performance of Cross-Project Defect Prediction

Ryu, Duksan;Baik, Jongmoon;

doi:10.3745/KTSDE.2018.7.3.77

KIPS Transactions on Software and Data Engineering (정보처리학회논문지:소프트웨어 및 데이터공학)

Volume 7 Issue 3
/
Pages.77-90
/
2018
/
2287-5905(pISSN)
/
2734-0503(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Effective Harmony Search-Based Optimization of Cost-Sensitive Boosting for Improving the Performance of Cross-Project Defect Prediction

교차 프로젝트 결함 예측 성능 향상을 위한 효과적인 하모니 검색 기반 비용 민감 부스팅 최적화

Ryu, Duksan (KAIST, School of Computing) ;
Baik, Jongmoon (KAIST, School of Computing)

류덕산 ;
백종문

Received : 2017.10.17
Accepted : 2017.12.09
Published : 2018.03.31

https://doi.org/10.3745/KTSDE.2018.7.3.77 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Software Defect Prediction (SDP) is a field of study that identifies defective modules. With insufficient local data, a company can exploit Cross-Project Defect Prediction (CPDP), a way to build a classifier using dataset collected from other companies. Most machine learning algorithms for SDP have used more than one parameter that significantly affects prediction performance depending on different values. The objective of this study is to propose a parameter selection technique to enhance the performance of CPDP. Using a Harmony Search algorithm (HS), our approach tunes parameters of cost-sensitive boosting, a method to tackle class imbalance causing the difficulty of prediction. According to distributional characteristics, parameter ranges and constraint rules between parameters are defined and applied to HS. The proposed approach is compared with three CPDP methods and a Within-Project Defect Prediction (WPDP) method over fifteen target projects. The experimental results indicate that the proposed model outperforms the other CPDP methods in the context of class imbalance. Unlike the previous researches showing high probability of false alarm or low probability of detection, our approach provides acceptable high PD and low PF while providing high overall performance. It also provides similar performance compared with WPDP.

소프트웨어 결함 예측(SDP)은 결함이 있는 모듈을 식별하기 위한 연구 분야이다. 충분한 로컬 데이터가 없으면 다른 회사에서 수집한 데이터를 사용하여 분류기를 구축하는 교차 프로젝트 결함 예측(CPDP)을 활용할 수 있다. SDP에 대한 대부분의 기계 학습 알고리즘은 서로 다른 값에 따라 예측 성능에 큰 영향을 미치는 하나 이상의 매개 변수를 사용한다. 본 연구의 목적은 CPDP의 예측 성능 향상을 위해 매개 변수 선택 기법을 제안하는 것이다. Harmony Search 알고리즘을 사용하여, 예측 어려움을 야기하는 클래스 불균형을 해결하는 방법인 비용에 민감한 부스팅의 매개 변수를 조정한다. 분포 특성에 따라 매개 변수 범위와 매개 변수 간의 제한 조건 규칙이 정의되어 하모니 검색 알고리즘에 적용된다. 제안된 접근법은 15개의 대상 프로젝트를 대상으로 3개의 CPDP 모델과 내부프로젝트 결함 예측(WPDP) 모델을 비교한다. 실험 결과는 제안된 방법이 클래스 불균형의 맥락에서 다른 CPDP 방법보다 성능이 우수하다는 것을 보여준다. 이전의 연구에서는 탐지 확률이 낮거나 오보 가능성이 높았으나 우리의 기법은 높은 PD와 낮은 PF를 제공하면서 높은 전체 성능을 보였다. 또한 WPDP와 비슷한 성능을 제공하였다.

Keywords

References

H. He and E. A. Garcia, "Learning from imbalanced data," IEEE Trans. Knowl. Data Eng., Vol.21, No.9, pp.1263-1284, Sep., 2009. https://doi.org/10.1109/TKDE.2008.239
D. Ryu, J.-I. Jang, and J. Baik, "A transfer cost-sensitive boosting approach for cross-project defect prediction," Softw. Qual. J., pp.1-38, 2015.
Z. Geem, J. Kim, and G. Loganathan, "A new heuristic optimization algorithm: harmony search," Simulation, Vol.76, No.2, pp.60-68, 2001. https://doi.org/10.1177/003754970107600201
M. Jureczko and L. Madeyski, "Towards identifying software project clusters with regard to defect prediction," Proc. 6th Int. Conf. Predict. Model. Softw. Eng. - PROMISE '10, p. 1, 2010.
T. Menzies, B. Caglayan, Z. He, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan, "The PROMISE Repository of empirical software engineering data," 2012. [Online]. Available: http://openscience.us/repo/.
T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, "A Systematic Literature Review on Fault Prediction Performance in Software Engineering," IEEE Trans. Softw. Eng., Vol.38, No.6, pp.1276-1304, Nov., 2012. https://doi.org/10.1109/TSE.2011.103
E. Arisholm, L. C. Briand, and E. B. Johannessen, "A systematic and comprehensive investigation of methods to build and evaluate fault prediction models," J. Syst. Softw., Vol.83, No.1, pp.2-17, Jan., 2010. https://doi.org/10.1016/j.jss.2009.06.055
M. D'Ambros, M. Lanza, and R. Robbes, "Evaluating defect prediction approaches: A benchmark and an extensive comparison," Empir. Softw. Eng., Vol.17, No.4-5, pp.531-577, Aug., 2012. https://doi.org/10.1007/s10664-011-9173-9
K. Dejaeger, "Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers," Softw. Eng. IEEE Trans., Vol.39, No.2, pp.237-257, 2013. https://doi.org/10.1109/TSE.2012.20
K. O. Elish and M. O. Elish, "Predicting defect-prone software modules using support vector machines," J. Syst. Softw., Vol.81, No.5, pp.649-660, May, 2008. https://doi.org/10.1016/j.jss.2007.07.040
Y. Singh, A. Kaur, and R. Malhotra, "Empirical validation of object-oriented metrics for predicting fault proneness models," Softw. Qual. J., Vol.18, No.1, pp.3-35, Jul., 2009. https://doi.org/10.1007/s11219-009-9079-6
T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, "Cross-project defect prediction," in Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, 2009, pp.91-100.
B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, "On the relative value of cross-company and within-company data for defect prediction," Empir. Softw. Eng., Vol.14, No.5, pp.540-578, Jan., 2009. https://doi.org/10.1007/s10664-008-9103-7
Z. He, F. Shu, Y. Yang, M. Li, and Q. Wang, "An investigation on the feasibility of cross-project defect prediction," Autom. Softw. Eng., Vol.19, No.2, pp.167-199, Jul., 2011. https://doi.org/10.1007/s10515-011-0090-3
Y. Ma, G. Luo, X. Zeng, and A. Chen, "Transfer learning for cross-company software defect prediction," Inf. Softw. Technol., Vol.54, No.3, pp.248-256, Mar., 2012. https://doi.org/10.1016/j.infsof.2011.09.007
D. Ryu, O. Choi, and J. Baik, "Value-cognitive boosting with a support vector machine for cross-project defect prediction," Empir. Softw. Eng., Vol.21, No.1, pp.43-71, Feb., 2016. https://doi.org/10.1007/s10664-014-9346-4
D. Ryu, J. Jang, and J. Baik, "A Hybrid Instance Selection using Nearest-Neighbor for Cross-Project Defect Prediction," J. Comput. Sci. Technol., Vol.30, No.5, pp.969-980, 2015. https://doi.org/10.1007/s11390-015-1575-5
G. Canfora, A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella, "Defect prediction as a multiobjective optimization problem," Softw. Testing, Verif. Reliab., Vol.25, Issue 4, pp.426-459, 2015. https://doi.org/10.1002/stvr.1570
D. Ryu and J. Baik, "Effective Multi-Objective Naive Bayes Learning for Cross-Project Defect Prediction," Appl. Soft Comput. J., Vol.49, pp.1062-1077, 2016. https://doi.org/10.1016/j.asoc.2016.04.009
M. Harman, P. McMinn, J. De Souza, and S. Yoo, "Search based software engineering: Techniques, taxonomy, tutorial," Empir. Softw. Eng. Verif., pp.1-59, 2012.
S. Merler, C. Furlanello, B. Larcher, and A. Sboner, "Tuning cost-sensitive boosting and its application to melanoma diagnosis," Mult. Classif. Syst., pp.32-42, 2001.
S. Wang and X. Yao, "Using Class Imbalance Learning for Software Defect Prediction," IEEE Trans. Reliab., Vol.62, No.2, pp.434-443, Jun., 2013. https://doi.org/10.1109/TR.2013.2259203
S. Wang, H. Chen, and X. Yao, "Negative correlation learning for classification ensembles," 2010 Int. Jt. Conf. Neural Networks, pp.1-8, Jul., 2010.
D. Manjarres, I. Landa-Torres, S. Gil-Lopez, J. Del Ser, M.N. Bilbao, S. Salcedo-Sanz and Z.W. Geem, "A survey on applications of the harmony search algorithm," Eng. Appl. Artif. Intell., Vol.26, No.8, pp.1818-1831, Sep., 2013. https://doi.org/10.1016/j.engappai.2013.05.008
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE?: Synthetic Minority Over-sampling Technique," J. Artif. Intell. Res., Vol.16, pp.321-357, 2002.
I. Tomek, "Two modifications of CNN," IEEE Trans. Syst. Man Cybern., pp.769-772, 1976.
L. Chen, B. Fang, Z. Shang, and Y. Tang, "Negative samples reduction in cross-company software defects prediction," Inf. Softw. Technol., Vol.62, pp.67-77, 2015. https://doi.org/10.1016/j.infsof.2015.01.014
W. Fan, S. Stolfo, J. Zhang, and P. Chan, "AdaCost: misclassification cost-sensitive boosting," ICML, 1999.
Y. Sun, A. Wong, and Y. Wang, "Parameter inference of costsensitive boosting algorithms," in International Conference on Machine Learning and Data Mining, 2005, pp.21-30.
M. Hall, E. Frank, and G. Holmes, "The WEKA data mining software: an update," ACM SIGKDD Explor. Newsl., Vol.11, No.1, pp.10-18, 2009. https://doi.org/10.1145/1656274.1656278
Z. W. Geem, "Optimal cost design of water distribution networks using harmony search," Eng. Optim., Vol.38, pp.259-277, 2006. https://doi.org/10.1080/03052150500467430
Z. W. Geem, "State-of-the-Art in the Structure of Harmony Search Algorithm," in Recent Advances In Harmony Search Algorithm, Springer Berlin Heidelberg, 2010, pp.1-10.
T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, "Defect prediction from static code features: current results, limitations, new approaches," Autom. Softw. Eng., Vol.17, No.4, pp.375-407, May, 2010. https://doi.org/10.1007/s10515-010-0069-5

KIPS Transactions on Software and Data Engineering (정보처리학회논문지:소프트웨어 및 데이터공학)

Effective Harmony Search-Based Optimization of Cost-Sensitive Boosting for Improving the Performance of Cross-Project Defect Prediction

교차 프로젝트 결함 예측 성능 향상을 위한 효과적인 하모니 검색 기반 비용 민감 부스팅 최적화

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)