A Study on Code Vulnerability Repair via Large Language Models

대규모 언어모델을 활용한 코드 취약점 리페어

  • Woorim Han (Dept. of Electrical and Computer Engineering and Inter-University Semiconductor Research Center (ISRC), Seoul National University) ;
  • Miseon Yu (Dept. of Electrical and Computer Engineering and Inter-University Semiconductor Research Center (ISRC), Seoul National University) ;
  • Yunheung Paek (Dept. of Electrical and Computer Engineering and Inter-University Semiconductor Research Center (ISRC), Seoul National University)
  • 한우림 (서울대학교 전기정보공학부, 서울대학교 반도체 공동연구소) ;
  • 유미선 (서울대학교 전기정보공학부, 서울대학교 반도체 공동연구소) ;
  • 백윤흥 (서울대학교 전기정보공학부, 서울대학교 반도체 공동연구소)
  • Published : 2024.05.23

Abstract

Software vulnerabilities represent security weaknesses in software systems that attackers exploit for malicious purposes, resulting in potential system compromise and data breaches. Despite the increasing prevalence of these vulnerabilities, manual repair efforts by security analysts remain time-consuming. The emergence of deep learning technologies has provided promising opportunities for automating software vulnerability repairs, but existing AIbased approaches still face challenges in effectively handling complex vulnerabilities. This paper explores the potential of large language models (LLMs) in addressing these limitations, examining their performance in code vulnerability repair tasks. It introduces the latest research on utilizing LLMs to enhance the efficiency and accuracy of fixing security bugs.

Keywords

Acknowledgement

This work was supported by the BK21 FOUR program of the Education and Research Program for Future ICT Pioneers, Seoul National University in 2024 and was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2023-00277326). Also, this work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) under the artificial intelligence semiconductor support program to nurture the best talents (IITP-2023-RS-2023-00256081) grant funded by the Korea government(MSIT) and was supported by Inter-University Semiconductor Research Center (ISRC).

References

  1. CVE Community. 2023. Official website of Common Vulnerabilities and Expo-sures. https://www.cve.org/.
  2. ED TARGETT. 2022. We analysed 90,000+ software vulnerabilities: Here's what we learned. https://www.thestack.technology/analysis-of-cves-in-2022-software-vulnerabilities-cwes-most-dangerous/.
  3. H. Pearce, B. Tan, B. Ahmad, R. Karri and B. Dolan-Gavitt, "Examining Zero-Shot Vulnerability Repair with Large Language Models," in 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 2023 pp. 2339-2356.
  4. Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung. VulRepair: a T5-based automated software vulnerability repair. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 935-947.
  5. Zimin Chen, Steve Kommrusch, and Martin Monperrus. 2022. Neural transfer learning for repairing security vulnerabilities in c code. IEEE Transactions on Software Engineering 49, 1 (2022), 147-165.
  6. Guru Bhandari, Amara Naseer, and Leon Moonen. 2021. CVEfixes: automated collection of vulnerabilities and their fixes from open-source software. In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering. 30-39.
  7. Jiahao Fan, Yi Li, Shaohua Wang, and Tien N Nguyen. 2020. AC/C++ code vulnerability dataset with code changes and CVE summaries. In Proceedings of the 17th International Conference on Mining Software Repositories. 508-512.
  8. Xin Zhou, Kisub Kim, Bowen Xu, DongGyun Han, and David Lo. Out of Sight, Out of Mind: Better Automatic Vulnerability Repair by Broadening Input Ranges and Sources. In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE). IEEE Computer Society, 872-872.