Browse > Article
http://dx.doi.org/10.3745/KTSDE.2018.7.3.77

Effective Harmony Search-Based Optimization of Cost-Sensitive Boosting for Improving the Performance of Cross-Project Defect Prediction  

Ryu, Duksan (KAIST, School of Computing)
Baik, Jongmoon (KAIST, School of Computing)
Publication Information
KIPS Transactions on Software and Data Engineering / v.7, no.3, 2018 , pp. 77-90 More about this Journal
Abstract
Software Defect Prediction (SDP) is a field of study that identifies defective modules. With insufficient local data, a company can exploit Cross-Project Defect Prediction (CPDP), a way to build a classifier using dataset collected from other companies. Most machine learning algorithms for SDP have used more than one parameter that significantly affects prediction performance depending on different values. The objective of this study is to propose a parameter selection technique to enhance the performance of CPDP. Using a Harmony Search algorithm (HS), our approach tunes parameters of cost-sensitive boosting, a method to tackle class imbalance causing the difficulty of prediction. According to distributional characteristics, parameter ranges and constraint rules between parameters are defined and applied to HS. The proposed approach is compared with three CPDP methods and a Within-Project Defect Prediction (WPDP) method over fifteen target projects. The experimental results indicate that the proposed model outperforms the other CPDP methods in the context of class imbalance. Unlike the previous researches showing high probability of false alarm or low probability of detection, our approach provides acceptable high PD and low PF while providing high overall performance. It also provides similar performance compared with WPDP.
Keywords
Cost-Sensitive Boosting; Cross-Project Defect Prediction; Harmony Search; Search-Based Software Engineering; Transfer Learning;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. Jureczko and L. Madeyski, "Towards identifying software project clusters with regard to defect prediction," Proc. 6th Int. Conf. Predict. Model. Softw. Eng. - PROMISE '10, p. 1, 2010.
2 T. Menzies, B. Caglayan, Z. He, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan, "The PROMISE Repository of empirical software engineering data," 2012. [Online]. Available: http://openscience.us/repo/.
3 T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, "A Systematic Literature Review on Fault Prediction Performance in Software Engineering," IEEE Trans. Softw. Eng., Vol.38, No.6, pp.1276-1304, Nov., 2012.   DOI
4 E. Arisholm, L. C. Briand, and E. B. Johannessen, "A systematic and comprehensive investigation of methods to build and evaluate fault prediction models," J. Syst. Softw., Vol.83, No.1, pp.2-17, Jan., 2010.   DOI
5 M. D'Ambros, M. Lanza, and R. Robbes, "Evaluating defect prediction approaches: A benchmark and an extensive comparison," Empir. Softw. Eng., Vol.17, No.4-5, pp.531-577, Aug., 2012.   DOI
6 K. Dejaeger, "Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers," Softw. Eng. IEEE Trans., Vol.39, No.2, pp.237-257, 2013.   DOI
7 K. O. Elish and M. O. Elish, "Predicting defect-prone software modules using support vector machines," J. Syst. Softw., Vol.81, No.5, pp.649-660, May, 2008.   DOI
8 Y. Singh, A. Kaur, and R. Malhotra, "Empirical validation of object-oriented metrics for predicting fault proneness models," Softw. Qual. J., Vol.18, No.1, pp.3-35, Jul., 2009.   DOI
9 Z. Geem, J. Kim, and G. Loganathan, "A new heuristic optimization algorithm: harmony search," Simulation, Vol.76, No.2, pp.60-68, 2001.   DOI
10 T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, "Cross-project defect prediction," in Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, 2009, pp.91-100.
11 B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, "On the relative value of cross-company and within-company data for defect prediction," Empir. Softw. Eng., Vol.14, No.5, pp.540-578, Jan., 2009.   DOI
12 Z. He, F. Shu, Y. Yang, M. Li, and Q. Wang, "An investigation on the feasibility of cross-project defect prediction," Autom. Softw. Eng., Vol.19, No.2, pp.167-199, Jul., 2011.   DOI
13 Y. Ma, G. Luo, X. Zeng, and A. Chen, "Transfer learning for cross-company software defect prediction," Inf. Softw. Technol., Vol.54, No.3, pp.248-256, Mar., 2012.   DOI
14 D. Ryu, O. Choi, and J. Baik, "Value-cognitive boosting with a support vector machine for cross-project defect prediction," Empir. Softw. Eng., Vol.21, No.1, pp.43-71, Feb., 2016.   DOI
15 H. He and E. A. Garcia, "Learning from imbalanced data," IEEE Trans. Knowl. Data Eng., Vol.21, No.9, pp.1263-1284, Sep., 2009.   DOI
16 D. Ryu, J.-I. Jang, and J. Baik, "A transfer cost-sensitive boosting approach for cross-project defect prediction," Softw. Qual. J., pp.1-38, 2015.
17 D. Ryu, J. Jang, and J. Baik, "A Hybrid Instance Selection using Nearest-Neighbor for Cross-Project Defect Prediction," J. Comput. Sci. Technol., Vol.30, No.5, pp.969-980, 2015.   DOI
18 S. Merler, C. Furlanello, B. Larcher, and A. Sboner, "Tuning cost-sensitive boosting and its application to melanoma diagnosis," Mult. Classif. Syst., pp.32-42, 2001.
19 G. Canfora, A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella, "Defect prediction as a multiobjective optimization problem," Softw. Testing, Verif. Reliab., Vol.25, Issue 4, pp.426-459, 2015.   DOI
20 D. Ryu and J. Baik, "Effective Multi-Objective Naive Bayes Learning for Cross-Project Defect Prediction," Appl. Soft Comput. J., Vol.49, pp.1062-1077, 2016.   DOI
21 S. Wang and X. Yao, "Using Class Imbalance Learning for Software Defect Prediction," IEEE Trans. Reliab., Vol.62, No.2, pp.434-443, Jun., 2013.   DOI
22 S. Wang, H. Chen, and X. Yao, "Negative correlation learning for classification ensembles," 2010 Int. Jt. Conf. Neural Networks, pp.1-8, Jul., 2010.
23 D. Manjarres, I. Landa-Torres, S. Gil-Lopez, J. Del Ser, M.N. Bilbao, S. Salcedo-Sanz and Z.W. Geem, "A survey on applications of the harmony search algorithm," Eng. Appl. Artif. Intell., Vol.26, No.8, pp.1818-1831, Sep., 2013.   DOI
24 N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE?: Synthetic Minority Over-sampling Technique," J. Artif. Intell. Res., Vol.16, pp.321-357, 2002.
25 I. Tomek, "Two modifications of CNN," IEEE Trans. Syst. Man Cybern., pp.769-772, 1976.
26 L. Chen, B. Fang, Z. Shang, and Y. Tang, "Negative samples reduction in cross-company software defects prediction," Inf. Softw. Technol., Vol.62, pp.67-77, 2015.   DOI
27 W. Fan, S. Stolfo, J. Zhang, and P. Chan, "AdaCost: misclassification cost-sensitive boosting," ICML, 1999.
28 M. Harman, P. McMinn, J. De Souza, and S. Yoo, "Search based software engineering: Techniques, taxonomy, tutorial," Empir. Softw. Eng. Verif., pp.1-59, 2012.
29 Y. Sun, A. Wong, and Y. Wang, "Parameter inference of costsensitive boosting algorithms," in International Conference on Machine Learning and Data Mining, 2005, pp.21-30.
30 M. Hall, E. Frank, and G. Holmes, "The WEKA data mining software: an update," ACM SIGKDD Explor. Newsl., Vol.11, No.1, pp.10-18, 2009.   DOI
31 Z. W. Geem, "Optimal cost design of water distribution networks using harmony search," Eng. Optim., Vol.38, pp.259-277, 2006.   DOI
32 Z. W. Geem, "State-of-the-Art in the Structure of Harmony Search Algorithm," in Recent Advances In Harmony Search Algorithm, Springer Berlin Heidelberg, 2010, pp.1-10.
33 T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, "Defect prediction from static code features: current results, limitations, new approaches," Autom. Softw. Eng., Vol.17, No.4, pp.375-407, May, 2010.   DOI