DOI QR코드

DOI QR Code

A Technique to Recommend Appropriate Developers for Reported Bugs Based on Term Similarity and Bug Resolution History

개발자 별 버그 해결 유형을 고려한 자동적 개발자 추천 접근법

  • 박성훈 (경북대학교 컴퓨터학부) ;
  • 김정일 (경북대학교 컴퓨터학부) ;
  • 이은주 (경북대학교 컴퓨터학부)
  • Received : 2014.09.15
  • Accepted : 2014.10.17
  • Published : 2014.12.31

Abstract

During the development of the software, a variety of bugs are reported. Several bug tracking systems, such as, Bugzilla, MantisBT, Trac, JIRA, are used to deal with reported bug information in many open source development projects. Bug reports in bug tracking system would be triaged to manage bugs and determine developer who is responsible for resolving the bug report. As the size of the software is increasingly growing and bug reports tend to be duplicated, bug triage becomes more and more complex and difficult. In this paper, we present an approach to assign bug reports to appropriate developers, which is a main part of bug triage task. At first, words which have been included the resolved bug reports are classified according to each developer. Second, words in newly bug reports are selected. After first and second steps, vectors whose items are the selected words are generated. At the third step, TF-IDF(Term frequency - Inverse document frequency) of the each selected words are computed, which is the weight value of each vector item. Finally, the developers are recommended based on the similarity between the developer's word vector and the vector of new bug report. We conducted an experiment on Eclipse JDT and CDT project to show the applicability of the proposed approach. We also compared the proposed approach with an existing study which is based on machine learning. The experimental results show that the proposed approach is superior to existing method.

소프트웨어 개발 및 유지보수 과정에서 여러 종류의 버그가 발생된다. 버그는 소프트웨어의 개발 및 유지 보수 시간을 증가시키는 주요원인으로 소프트웨어의 품질 저하를 초래한다. 버그의 발생을 사전에 완벽하게 방지하는 것은 불가능하다. 대신 버그 질라(Bugzilla), 멘티스BT(MantisGBT), 트랙 (Trac), 질라 (JIRA)와 같은 버그 트래킹 시스템을 이용하여 버그를 효과적으로 관리하는 것이 가능하다. 개발자 또는 사용자가 발생된 버그를 버그 트래킹 시스템에 보고하면, 프로젝트 매니저에 의해서 보고된 버그는 버그 해결에 적합한 개발자에게 전달되어 해결될 때까지 버그 트래킹 시스템에 의해서 추척된다. 여기서 프로젝트 매니저가 버그 해결에 적합한 개발자를 선별하는 것을 버그 분류 작업 (Bug triaging)이라고 하며, 대량으로 발생되는 버그 리포트들을 수동으로 분류하는 것은 프로젝트 매니저에게 있어서 매우 어려운 문제가 된다. 본 논문에서는 버그 트래킹 시스템에 저장된 과거에 해결된 버그 리포트에서 개발자 별 버그 해결 유형을 추출하고, 이를 활용한 버그 분류 작업, 즉 개발자 추천 방법을 제안한다. 먼저 버그 트래킹 시스템에서 각 개발자가 해결한 버그 리포트들을 분류한 후, 자연 언어 처리 알고리즘과 TF-IDF (Term frequency-Inverse document frequency)를 활용하여 각 개발자 별 단어 리스트를 생성한다. 그 후, 새로운 버그가 발생되었을 때 코사인 유사도를 통해서 생성된 개발자 별 단어 리스트와 새로운 버그 리포트의 단어 리스트를 비교하여 가장 유사한 단어 리스트를 가지는 개발자를 추천하는 방법이다. 두 오픈 소스 프로젝트인 이클립스 JDT.UI와 CDT.CORE를 대상으로 수행한 개발자 추천 실험에서 기계 학습 모델 기반의 추천 방법보다 제안하는 방법이 더 우수한 결과를 얻은 것을 확인하였다.

Keywords

References

  1. E. S. Raymond, "The cathedral and the bazaar", Transactions on Knowledge, Technology & Policy, Vol.12, pp.23-49, 1998.
  2. J. Anvik, L. Hiew, and G. C. Murphy, "Who should fix this bug?", in Proceedings of the 28th international conference on Software engineering, pp.361-370, 2006.
  3. G. Canfora and L. Cerulo, "How software repositories can help in resolving a new change request," in Workshop on Empirical Studies in Reverse Engineering, pp.99-101, 2005.
  4. G. Jeong, S. Kim, and T. Zimmermann, "Improving bug triage with bug tossing graphs," in Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pp.111-120, 2009.
  5. P. Bhattacharya, I. Neamtiu, and C. R. Shelton, "Automated, highly-accurate, bug assignment using machine learning and tossing graphs," Transaction on Systems and Software, Vol. 85, No.10, pp.2275-2292, 2012. https://doi.org/10.1016/j.jss.2012.04.053
  6. G. Salton and M. J. McGill, "Introduction to modern information retrieval," 1st ed., McGraw-Hill, 1983.
  7. P. N. Tan, M. Steinbach and V. Kumar, "Introduction to Data Mining," 1st ed., Addison Wesley, 2005.
  8. T. Zimmermann, P. Weissgerber, S. Diel, and A. Zeller, "Mining version histories to guide software changes," IEEE Transactions on Software Engineering, Vol.31, pp.429-445, 2005. https://doi.org/10.1109/TSE.2005.72
  9. Alenezi, Mamdouh, Kenneth Magel, and Shadi Banitaan, "Efficient bug triaging using text mining," Journal of Software, Vol.8, No.9, pp.2185-2190, 2013.
  10. T. Joachims, "Text categorization with support vector machines: Learning with many relevant features," Springer Berlin Heidelberg, pp.137-142, 1998.
  11. L. Hiew, "Assisted detection of duplicate bug reports," M.S. dissertation, University of British Columbia, Vancouver, 2006.
  12. P. Runeson, M. Alexandersson, and O. Nyholm, "Detection of duplicate defect reports using natural language processing," in Proceedings of the 29th International Conference on Software Engineering, pp.499-510, 2007.
  13. X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, "An approach to detecting duplicate bug reports using natural language and execution information," in Proceedings of the 30th International Conference on Software Engineering, pp.461-470, 2008.
  14. Jalbert, Nicholas, and Westley Weimer, "Automated duplicate detection for bug tracking systems," in Proceedings of Dependable Systems and Networks With FTCS and DCC, pp.52-61, 2008.
  15. Sureka, Ashish, and Pankaj Jalote, "Detecting duplicate bug report using character n-gram-based features," in Proceedings of Software Engineering Conference (APSEC), pp.366-374, 2010.
  16. A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals, "Predicting the severity of a reported bug," in Proceedings of the Working Conference on Mining Software Repositories, pp.1-10, 2010.
  17. Shihab, Emad, et al., "Predicting re-opened bugs: A case study on the eclipse project," in Proceedings of Reverse Engineering (WCRE), pp.249-258, 2010.
  18. Valdivia Garcia, Harold, and Emad Shihab, "Characterizing and predicting blocking bugs in open source projects," in Proceedings of the 11th Working Conference on Mining Software Repositories, pp.72-81, 2014.
  19. D. Cubranic and G. C. Murphy, "Automatic bug triage using text categorization," in Proceedings of the Sixteenth International Conference on Software Engineering & Knowledge Engineering, pp.92-97, 2004.
  20. R. Quinlan, "C4.5: Programs for Machine Learning", Morgan kaufmann, Vol.1, 1993.
  21. G. H. John and P. Langley, "Estimating continuous distributions in Bayesian classifiers," in Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp.338-345, 1995.
  22. S. R. Gunn, "Support Vector Machines for classification and regression," Technical report, University of Southampton, 1998.
  23. R. A. Baeza-Yates and B. A. Ribeiro-Neto, "Modern Information Retrieval," Addison-Wesley, Vol.463, 1999.
  24. Tian, Yuan, David Lo, and Chengnian Sun, "Drone: Predicting priority of reported bugs by multi-factor analysis," in Proceedings of Software Maintenance (ICSM), 2013 29th IEEE International Conference on, pp. 200-209, 2013.
  25. Kim, Sunghun, E. James Whitehead, and Yi Zhang, "Classifying software changes: Clean or buggy?," IEEE Transactions on Software Engineering: Vol.34, Issue.2, pp. 181-196, 2008. https://doi.org/10.1109/TSE.2007.70773