DOI QR코드

DOI QR Code

Semantic Similarity-Based Contributable Task Identification for New Participating Developers

  • Kim, Jungil (Department of Software Technology Laboratory, Kyungpook National University) ;
  • Choi, Geunho (School of Computer Science and Engineering, Kyungpook National University) ;
  • Lee, Eunjoo (School of Computer Science and Engineering, Kyungpook National University)
  • Received : 2018.06.29
  • Accepted : 2018.10.16
  • Published : 2018.12.31

Abstract

In software development, the quality of a product often depends on whether its developers can rapidly find and contribute to the proper tasks. Currently, the word data of projects to which newcomers have previously contributed are mainly utilized to find appropriate source files in an ongoing project. However, because of the vocabulary gap between software projects, the accuracy of source file identification based on information retrieval is not guaranteed. In this paper, we propose a novel source file identification method to reduce the vocabulary gap between software projects. The proposed method employs DBPedia Spotlight to identify proper source files based on semantic similarity between source files of software projects. In an experiment based on the Spring Framework project, we evaluate the accuracy of the proposed method in the identification of contributable source files. The experimental results show that the proposed approach can achieve better accuracy than the existing method based on comparison of word vocabularies.

Keywords

E1ICAW_2018_v16n4_228_f0001.png 이미지

Fig. 1. The work flow of the proposed approach.

Table 1. Preprocessing results

E1ICAW_2018_v16n4_228_t0001.png 이미지

Table 2. Example of annotation

E1ICAW_2018_v16n4_228_t0002.png 이미지

Table 3. Results of Top-25 and Top-250 identification

E1ICAW_2018_v16n4_228_t0003.png 이미지

References

  1. R. E. Kraut, M. Burke, J. Riedl, and P. Resnick, "The challenges of dealing with newcomers," in Building Successful Online Communities: Evidence-Based Social Design. Cambridge, MA: MIT Press, pp. 179-230, 2002.
  2. G. Chandrika, "Study on software reliability and reliability testing," Asia-pacific Journal of Convergent Research Interchange, vol. 1, no. 1, pp. 7-20, 2015. DOI: 10.21742/apjcri.2015.03.02.
  3. A. J. Ko, B. A. Myers, M. J. Coblenz, and H. H. Aung, "An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks," IEEE Transaction on Software Engineering, vol. 32, no. 12, pp. 971-987, 2006. DOI: 10.1109/TSE.2006.116.
  4. M. Zelkowitz, A. Shaw, and J. Gannon, Principles of Software Engineering and Design. Englewood Cliffs, NJ: Prentice-Hall, 1979.
  5. R. Jones, R. Kumar, B. Pang, and A. Tomkins, ""I know what you did last summer": query logs and user privacy," in Proceedings of the 16th ACM conference on Conference on Information and Knowledge Management, Lisbon, Portugal, pp. 909-914, 2007. DOI: 10.1145/1321440.1321573.
  6. T. D. LaToza, G. Venolia, and R. DeLine, "Maintaining mental models: a study of developer work habits," in Proceedings of the 28th International Conference on Software Engineering, Shanghai, China, pp. 492-501, 2006. DOI: 10.1145/1134285.1134355.
  7. I. Steinmacher, I. S. Wiese, T. Conte, M. A. Gerosa, and D. Redmiles, "The hard life of open source software project newcomers," in Proceedings of the 7th International Workshop on Cooperative and Human Aspects of Software Engineering, Hyderabad, India, pp. 72-78. 2014. DOI: 10.1145/2593702.2593704.
  8. Y. Park and C. Jensen, "Beyond pretty pictures: examining the benefits of code visualization for open source newcomers," in Proceedings of the 5th IEEE International Workshop on Visualizing Software for Understanding and Analysis, Edmonton, Canada, pp. 3-10, 2009. DOI: 10.1109/VISSOF.2009.5336433.
  9. A. Kuhn, S. Ducasse, and T. Girba, "Semantic clustering: identifying topics in source code," Information and Software Technology, vol. 49, no. 3, pp. 230-243, 2007. DOI: 10.1016/j.infsof.2006.10.017.
  10. B. Dit, L. Guerrouj, D. Poshyvanyk, and G. Antoniol, "Can better identifier splitting techniques help feature location?," in Proceedings of IEEE 19th International Conference on Program Comprehension, Kingston, Canada, pp. 11-20, 2011. DOI: 10.1109/ICPC.2011.47.
  11. G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais, "The vocabulary problem in human-system communication," Communications of the ACM, vol. 30, no. 11, pp. 964-971, 1987. DOI: 10.1145/32206.32212.
  12. J. Daiber, M. Jakob, C. Hokamp, and P. N. Mendes, "Improving efficiency and accuracy in multilingual entity extraction," in Proceedings of the 9th International Conference on Semantic Systems, Graz, Austria, pp. 121-124. 2013. DOI: 10.1145/2506182.2506198.
  13. I. Steinmacher, I. S. Wiese, and M. A. Gerosa, "Recommending mentors to software project newcomers," in Proceedings of 2012 3rd International Workshop on Recommendation Systems for Software Engineering, Zurich, Switzerland, pp. 63-67, 2012. DOI: 10.1109/RSSE.2012.6233413.
  14. G. Canfora, M. Di Penta, R. Oliveto, and S. Panichella, "Who is going to mentor newcomers in open source projects?," in Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, Cary, NC, pp. 1-11, 2012. DOI: 10.1145/2393596.2393647.
  15. D. Cubranic and G. C. Murphy, "Hipikat: recommending pertinent software development artifacts," in Proceedings of 25th International Conference on Software Engineering, Portland, OR, pp. 408-418, 2003.
  16. Y. Malheiros, A. Moraes, C. Trindade, and S. Meira, "A source code recommender system to support newcomers," in Proceedings of IEEE 36th Annual Computer Software and Applications Conference, Izmir, Turkey, pp. 19-24, 2012. DOI: 10.1109/COMPSAC.2012.11.
  17. P. N. Mendes, M. Jakob, A. Garcia-Silva, and C. Bizer, "DBpedia spotlight: shedding light on the web of documents," in Proceedings of the 7th International Conference on Semantic Systems, Graz, Austria, pp. 1-8. 2011. DOI: 10.1145/2063518.2063519.
  18. G. Salton and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information Processing & Management, vol. 24, no. 5, pp. 513-523, 1988. DOI: 10.1016/0306-4573(88)90021-0.
  19. G. Gousios and D. Spinellis, "GHTorrent: GitHub's data from a firehose," in Proceedings of 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), Zurich, Switzerland, pp. 12-21, 2012. DOI: 10.1109/MSR.2012.6224294.
  20. R. Nielek, O. Jarczyk, K. Pawlak, L. Bukowski, R. Bartusiak, and A. Wierzbicki, "Choose a job you love: predicting choices of GitHub developers," in Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, Omaha, NE, pp. 200-207, 2016. DOI: 10.1109/WI.2016.0037.
  21. Y. Zhang, D. Lo, P. S. Kochhar, X. Xia, Q. Li, and J. Sun, "Detecting similar repositories on GitHub," in Proceedings of IEEE 24th International Conference on Software Analysis, Evolution and Reengineering, Klagenfurt, Austria, pp. 13-23, 2017. DOI: 10.1109/SANER.2017.7884605.
  22. G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. New York, NY: McGraw-Hill, 1983.
  23. J. Kim and E. Lee, "Understanding review expertise of developers: a reviewer recommendation approach based on latent Dirichlet allocation," Symmetry, vol. 10, article no. 114, 2018. DOI: 10.3390/sym10040114.