DOI QR코드

DOI QR Code

A Technique to Detect Change-Coupled Files Using the Similarity of Change Types and Commit Time

변경 유형의 유사도 및 커밋 시간을 이용한 파일 변경 결합도

  • 김정일 (경북대학교 컴퓨터학부) ;
  • 이은주 (경북대학교 컴퓨터학부)
  • Received : 2013.12.26
  • Accepted : 2014.01.21
  • Published : 2014.02.28

Abstract

Change coupling is a measure to show how strongly change-related two entities are. When two source files have been frequently changed together, they are regarded as change-coupled files and they will probably be changed together in the near future. In the previous studies, the change coupling between two files is defined with the number of common changed time, that is, common commit time of the files. However, the frequency-based technique has limitations because of 'tangled changes', which frequently happens in the development environments with version control systems. The tangled change means that several code hunks have been changed at the same time, though they have no relation with each other. In this paper, the change types of the code hunks are also used to define change coupling, in addition to the common commit time of target files. First, the frequency vector based on change types are defined with the extracted change types, and then, the similarity of change patterns are calculated using the cosine similarity measure. We conducted experiments on open source project Eclipse JDT and CDT for case studies. The result shows that the applicability of the proposed method, compared to the previous studies.

변경 결합도는 두 요소들 사이의 향후 변경 연관성을 알려준다. 만약, 소스 파일들이 자주 함께 변경된다면, 그 소스 파일들의 변경 결합도는 높다고 볼 수 있으며, 나중에 다시 함께 변경될 확률이 높다. 일반적으로 소스 파일들 사이의 변경 결합도는 공통 변경 횟수에 기반하여 정의되었다. 그런데 연관성이 낮은 변경들이 일괄적으로 함께 커밋되는 경우, 즉 뒤얽힌 변경(tangled change)과 같은 경우들이 빈번히 발생한다. 따라서 함께 변경된 횟수만으로 소스 파일의 변경 결합도를 결정하는 것은 한계가 있다. 본 논문에서는 기존의 방법을 보완하기 위해, 소스 파일의 변경 시간뿐 아니라 소스 코드 변경 유형의 유사성을 함께 고려하는 것을 제안하였다. 이를 위하여, 우선 추출된 변경 유형 정보를 이용하여 변경 유형 빈도 벡터를 정의하고, 다음에 코사인 유사도 측정을 통해서 각 소스 파일 버전에서 적용된 코드 변경 유사성을 계산한다. 이후 Eclipse 프로젝트인 JDT와 CDT에 대한 사례 연구를 통해 제안된 방법의 효용성을 보였다.

Keywords

References

  1. H. C. Gall, K. Hajek and M. Jazayeri, "Detection of logical coupling based on product release history." In Proceedings of the IEEE Software Maintenance, 1998, pp.190-198.
  2. T. Zimmermann, P. Weilbgerber, S. Diehl and A. Zeller, "Mining version histories to guide software changes." IEEE Transactions on Software Engineering, Vol.31, No.6, 2005, pp.429-445. https://doi.org/10.1109/TSE.2005.72
  3. H. Gall, M. Jazayeri and J. Krajewski, "CVS release history data for detecting logical couplings." In Proceedings of the 6th International Workshop on Software Evolution, 2003, pp.13-23.
  4. M. Steff and B. Russo, "Co-evolution of logical couplings and commits for defect estimation." In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories, 2012, p.213, 216.
  5. H. Kim and A. Zeller, "The impact of tangled code changes." In Proceedings of the 10th International Workshop on Mining Software Repositories. 2013, pp.121-130.
  6. B. Fluri, M. Wursch, M. Pinzger and H. C. Gall. "Change distilling: Tree differencing for fine-grained source code change extraction." IEEE Transactions on Software Engineering, Vol.33. No.11, 2007, pp.725-743. https://doi.org/10.1109/TSE.2007.70731
  7. M. Fischer, M. Pinzger and H. Gall, "Populating a release history database from version control and bug tracking systems." In Proceedings of IEEE International Conference on Software Maintenance, 2003, pp.23-32.
  8. H. Gall, M. Jazayeri, R. Klosch and G. Trausmuth, "Software evolution observations based on product release history." In Proceedings of IEEE Software Maintenance, 1997, pp.160-166.
  9. H. Kagdi, J. I. Maletic and B. Sharif, "Mining software repositories for traceability links." In Proceedings of 15th IEEE International Conference on Program Comprehension, 2007, p.145, 154.
  10. B. Fluri, H. C. Gall, and M. Pinzger, "Fine-grained analysis of change couplings." In Proceedings of the 15th IEEE International Workshop on Source Code Analysis and Manipulation, 2005, pp.66-74.
  11. R. Robbes and M Lanza, "A change-based approach to software evolution." Electronic Notes in Theoretical Computer Science, Vol.166, 2007, pp.93-109. https://doi.org/10.1016/j.entcs.2006.06.015
  12. R. Robbes, D. Pollet and M. Lanza, "Logical coupling based on fine-grained change information." In Proceeding of 15th Working Conference on Reverse Engineering, 2008, pp.42-46.
  13. J. Han, J. Pei and Y. Yin, "Mining frequent patterns without candidate generation." In Proceeding of the 2000 ACM SIGMOD international conference on Management of data, 2000, pp.1-12.
  14. A.T.T. Ying, G.C. Murphy, and R. Ng, and M.C. Chu-Carroll, "Predicting source code changes by mining change history." IEEE Transactions on Software Engineering, Vol.30, No.9, pp.574-586, 2004. https://doi.org/10.1109/TSE.2004.52
  15. G. Antoniol, V. F. Rollo and G. Venturi, "Detecting groups of co-changing files in CVS repositories." In Proceedings of the 8th International Workshop on Principles of Software Evolution, 2005, pp.23-32.
  16. S. Bouktif, Y. Gueheneuc and G. Antoniol, "Extracting change-patterns from cvs repositories." In Proceedings of the 13th Working Conference on Reverse Engineering, 2006, pp.221-230.
  17. Z. Xing, and S. Eleni, "Understanding class evolution in object-oriented software." In Proceedings of the 2th IEEE International Workshop on Program Comprehension, 2004, pp.34-43.
  18. B. Livshits and T. Zimmermann, "DynaMine: finding common error patterns by mining software revision histories." In Proceedings of the 10th European software engineering conference on Software Engineering, 2005, pp.296-305.
  19. D. Jr, R. Michael, "Predicting software change coupling." Ph.D. dissertation, University of Drexel, 2008.
  20. B. Fluri and H. C. Gall, "Classifying change types for qualifying change couplings." In Proceedings of the 14th IEEE International Conference on Program Comprehension, 2006, pp.35-45.
  21. B. Fluri, G. Emanuel and H. C. Gall, "Discovering patterns of change types." In Proceedings of 23rd IEEE/ACM International Conference on Automated Software Engineering, 2008, pp.463-466.
  22. H. Gall, B. Fluri and M. Pinzger, "Change analysis with evolizer and changedistiller." IEEE Software Vol.26, No.1, pp.26-33, 2009.
  23. J. Kim and E. Lee, "The Effect of IMPORT Change in Software Change History" In Proceedings of the 29th Symposium on Applied Computing, 2014. pp..(accepted)
  24. P. N. Tan, M. Steinbach and V. Kumar, "Introduction to Data Mining." 1st ed., Addison Wesley, 2006.
  25. http://zorba.knu.ac.kr/research/MSR_Survey/overall_table.html