Browse > Article
http://dx.doi.org/10.9708/jksci.2022.27.11.131

Implementation of Git's Commit Message Complex Classification Model for Software Maintenance  

Choi, Ji-Hoon (Dept. of Computer Engineering, Kongju National University)
Kim, Joon-Yong (Dept. of IT Convergence Software, Seoul Theological University)
Park, Seong-Hyun (Dept. of Computer Engineering, Kongju National University)
Abstract
Git's commit message is closely related to the project life cycle, and by this characteristic, it can greatly contribute to cost reduction and improvement of work efficiency by identifying risk factors and project status of project operation activities. Among these related fields, there are many studies that classify commit messages as types of software maintenance, and the maximum accuracy among the studies is 87%. In this paper, the purpose of using a solution using the commit classification model is to design and implement a complex classification model that combines several models to increase the accuracy of the previously published models and increase the reliability of the model. In this paper, a dataset was constructed by extracting automated labeling and source changes and trained using the DistillBERT model. As a result of verification, reliability was secured by obtaining an F1 score of 95%, which is 8% higher than the maximum of 87% reported in previous studies. Using the results of this study, it is expected that the reliability of the model will be increased and it will be possible to apply it to solutions such as software and project management.
Keywords
Commit Message; Multi-Label Classification; DistilBERT(Bidirectional Encoder Representations from Transformers); Source Change; Auto Labeling;
Citations & Related Records
연도 인용수 순위
  • Reference
1 B. Fluri, E. Giger, and H. C. Gall, "Discovering patterns of change types," in Automated Software Engineering, 2008. ASE 2008. 23rd IEEE/ACM International Conference on. IEEE, 2008, pp. 463-466. DOI: 10.1109/ASE.2008.74   DOI
2 M. Martinez, L. Duchien, and M. Monperrus, "Automatically extracting instances of code change patterns with ast analysis," arXiv preprint arXiv:1309.3730, 2013. DOI: 10.1109/ICSM.2013.54   DOI
3 V. Sanh, L. Debut, J. Chaumond, and T. Wolf, "Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter," arXiv preprint arXiv:1910.01108, 2019. DOI: 10.48550/arXiv.1910.01108
4 A. Mauczka, F. Brosch, C. Schanes, and T. Grechenig, "Dataset of developer-labeled commit messages," in 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. IEEE, 2015, pp. 490-493. DOI: 10.1109/MSR.2015.71   DOI
5 S. Zafar, M. Z. Malik, and G. S. Walia, "Towards standardizing and improving classification of bug-fix commits," in 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 2019, pp. 1-6. DOI: 10.1109/ESEM.2019.8870174   DOI
6 Miller, C. G. (2022). Introduction to Git.
7 M. U. Sarwar, S. Zafar, M. W. Mkaouer, G. S. Walia and M. Z. Malik, "Multi-label Classification of Commit Messages using Transfer Learning," 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 37-42, 2020, doi: 10.1109/ISSREW51248.2020.00034.   DOI
8 Mockus and Votta, "Identifying reasons for software changes using historic databases," Proceedings 2000 International Conference on Software Maintenance, pp. 120-130, 2000. doi:10.1109/ICSM.2000.883028.   DOI
9 S. Levin, and A. Yehudai, "Boosting Automatic Commit Classification Into Maintenance Activities By Utilizing Source Code Changes," PROMISE: Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 97-106, November. 2017. doi: 10.1145/3127005.3127016   DOI
10 A. Adhikari, A. Ram, R. Tang, and J. Lin, "DocBERT: BERT for Document Classification," arXiv, 2019. doi: 10.48550/ARXIV.1904.08398
11 H. C. Gall, B. Fluri, and M. Pinzger, "hange analysis with evolizer and changedistiller," IEEE Software, vol. 26, no. 1, p. 26, 2009. DOI: 10.1109/MS.2009.6   DOI
12 B. Fluri, M. Wursch, M. PInzger, and H. C. Gall, "Change distilling: Tree differencing for fine-grained source code change extraction," Software Engineering, IEEE Transactions on, vol. 33, no. 11, pp. 725-743, 2007. DOI: 10.1109/TSE.2007.70731   DOI
13 Sun, Chi, et al. "How to Fine-Tune BERT for Text Classification?" Lecture Notes in Computer Science, 2019, pp. 194-206. DOI:10.1007/978-3-030-32381-3_16   DOI
14 Hultstrand, S., & Olofsson, R. (2015). Git-CLI or GUI: Which is most widely used and why?.
15 Devlin, Jacob, et al. "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding." Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, June 2019, pp. 4171-4186. DOI:10.18653/v1/N19-1423   DOI
16 S. Gharbi, M. W. Mkaouer, I. Jenhani, and M. B. Messaoud, "On the Classification of Software Change Messages Using Multi-Label Active Learning," in Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, 2019, pp. 1760-1767. 2019. doi: 10.1145/3297280.3297452   DOI