[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2019.03.026

Applying Topic Modeling and Similarity for Predicting Bug Severity in Cross Projects

Yang, Geunseok (Department of Computer Science, University of Seoul)
Min, Kyeongsic (Department of Computer Science, University of Seoul)
Lee, Jung-Won (Department of Electrical and Computer Engineering, Ajou University)
Lee, Byungjeong (Department of Computer Science, University of Seoul)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.13, no.3, 2019 , pp. 1583-1598 More about this Journal

Abstract

Recently, software has increased in complexity and been applied in various industrial fields. As a result, the presence of software bugs cannot be avoided. Various bug severity prediction methodologies have been proposed, but their performance needs to be further improved. In this study, we propose a novel technique for bug severity prediction in cross projects such as Eclipse, Mozilla, WireShark, and Xamarin by using topic modeling and similarity (i.e., KL-divergence). First, we construct topic models from bug repositories in cross projects using Latent Dirichlet Allocation (LDA). Then, we find topics in each project that contain the most numerous similar bug reports by using a new bug report. Next, we extract the bug reports belonging to the selected topics and input them to a Naïve Bayes Multinomial (NBM) algorithm. Finally, we predict the bug severity in the new bug report. In order to evaluate the performance of our approach and to verify the difference between cross projects and single project, we compare it with the Naïve Bayes Multinomial approach; the Lamkanfi methodology, which is a well-known bug severity prediction approach; and an emotional similarity-based bug severity prediction approach. Our approach exhibits a better performance than the compared methods.

Keywords

Bug Severity Prediction; Cross Projects; Topic Modeling; KL-Divergence; Bug Report;

Citations & Related Records

Reference

1	C. Z. Yang, C. C. Hou, W. C. Kao, and X. Chen, "An Empirical Study on Improving Severity Prediction of Defect Reports using Feature Selection," in Proc. of 19th Asia-Pacific Software Engineering Conference, pp. 240-249, 2012.
2	G. Yang, T. Zhang, and B. Lee, "Towards Semi-Automatic Bug Triage and Severity Prediction Based on Topic Model and Multi-feature of Bug Reports," in Proc. of Computer Software and Applications Conference, 2014.
3	T. Zimmermann, R. Premraj, J. Sillito, and S. Breu, "Improving Bug Tracking Systems," in Proc. of ICSE Companion, pp. 247-250, 2009.
4	G. Yang, S. Baek, J. W. Lee, and B. Lee, "Analyzing Emotion Words to Predict Severity of Software Bugs: A Case Study of Open Source Projects," in Proc. of ACM Symposium on Applied Computing, 2017.
5	T. Zimmermann, R. Premraj, N. Bettenburg, S. Just, A. Schroter, and C. Weiss, "What Makes a Good Bug Report?," IEEE Transactions on Software Engineering, Vol. 36, No. 5, pp. 618-643, 2010. DOI
6	David M. Blei, Andrew Y. Ng and Michael I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, Vol. 3, pp. 993-1022, 2003.
7	S. Rao and A. Kak, "Retrieval from Software Libraries for Bug Localization: A Comparative Study of Generic and Composite Text Models," in Proc. of Working Conference on Mining Software Repositories, pp. 43-52, 2011.
8	WireShark, "https://bugs.wireshark.org/bugzilla," Retrieved March 10, 2019.
9	Eclipse, "https://bugs.eclipse.org/bugs," Retrieved March 10, 2019.
10	Mozilla, "https://bugzilla.mozilla.org," Retrieved March 10, 2019.
11	Xamarin, "https://bugzilla.xamarin.com," Retrieved March 10, 2019.
12	A. Lamkanfi, S. Demeyer, Q. D. Soetens, and T. Verdonck, "Comparing Mining Algorithms for Predicting the Severity of a Reported Bug," in Proc. of Software Maintenance and Reengineering, pp. 249-258, 2011.
13	A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals, "Predicting the Severity of a Reported Bug," in Proc. of 7th IEEE Working Conference on Mining Software Repositories, pp. 1-10, 2010.
14	G. Yang, T. Zhang, and B. Lee, "An Emotion Similarity Based Severity Prediction of Software Bugs: A Case Study of Open Source Projects," IEICE Transactions on Information and Systems, Vol. E101.D, Issue. 8, pp. 2015-2026, 2018. DOI
15	Mozilla Bug Report #97777, https://bugzilla.mozilla.org/show_bug.cgi?id=97777, Retrieved March 10, 2019.
16	C. Goutte, and E. Gaussier, "A Probabilistic Interpretation of Precision, Recall and F-score, with Implication for Evaluation," in Proc. of European Conference on Information Retrieval, pp. 345-359, 2005.
17	R. Kohavi, "A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection," in Proc. of the 14th international joint conference on Artificial intelligence, Vol. 14, No. 2, pp. 1137-1145, 1995.
18	The T-Test, "Research Methods Knowledge Base," "http://www.socialresearchmethods.net/kb/stat_t.php,"
19	Shapiro, S. S., and Wilk, M. B., "An Analysis of Variance Test for Normality (Complete Samples)," Biometrika, Vol. 52, No. 3/4, pp. 591-611, 1965. DOI
20	Wilcoxon, F., "Individual Comparisons by Ranking Methods," Biometrics bulletin, Vol. 1, No. 6, pp. 80-83, 1945. DOI
21	T. Zhang, J. Chen, G. Yang, B. Lee, and X. Luo, "Towards More Accurate Severity Prediction and Fixer Recommendation of Software Bugs," Journal of Systems and Software, Vol. 117, pp. 166-184, 2016. DOI
22	Y. Tian, D. Lo, and C. Sun, "Information Retrieval based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction," in Proc. of 19th Working Conference on Reverse Engineering, pp. 215-224, 2012.