[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.13089/JKIISC.2022.32.6.1139

A BERT-Based Deep Learning Approach for Vulnerability Detection

Jin, Wenhui (Department of Computer Science and Engineering, Hanyang University)
Oh, Heekuck (Department of Computer Science and Engineering, Hanyang University)

Publication Information

Journal of the Korea Institute of Information Security & Cryptology / v.32, no.6, 2022 , pp. 1139-1150 More about this Journal

Abstract

With the rapid development of SW Industry, softwares are everywhere in our daily life. The number of vulnerabilities are also increasing with a large amount of newly developed code. Vulnerabilities can be exploited by hackers, resulting the disclosure of privacy and threats to the safety of property and life. In particular, since the large numbers of increasing code, manually analyzed by expert is not enough anymore. Machine learning has shown high performance in object identification or classification task. Vulnerability detection is also suitable for machine learning, as a reuslt, many studies tried to use RNN-based model to detect vulnerability. However, the RNN model is also has limitation that as the code is longer, the earlier can not be learned well. In this paper, we proposed a novel method which applied BERT to detect vulnerability. The accuracy was 97.5%, which increased by 1.5%, and the efficiency also increased by 69% than Vuldeepecker.

Keywords

Deep Learning; Vulnerability Detection; Source Code; BERT; Program Slicing;

Citations & Related Records

Reference

1	Weiser, Mark. "Program slicing," IEEE Transactions on Software Engineering, vol. 4, pp. 352-357, Jul. 1984 DOI
2	Shar, Lwin Khin, and Hee Beng Kuan Tan, "Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns," Information and Software Technology, vol. 55, no. 10, pp. 1767-1780, Oct. 2013 DOI
3	Fotify, "fotify" https://www.microfocus.com/en-us/cyberres/application-security/static-code-analyzer, Dec. 2022
4	Checkmarx, "Checkmarx" https://checkma rx.com/, Dec. 2022
5	AFL, "AFL" https://github.com/google/AFL, Dec. 2022
6	Fan, Ming, et al., "Text backdoor detection using an interpretable rnn abstract model," Jonornal of the IEEE Transactions on Information Forensics and Security, 16(0), pp. 4117-4132, Aug. 2021 DOI
7	Bohme Marcel, et al., "Directed greybox fuzzing," Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 2329-2344, Oct. 2017
8	Rawat Sanjay, et al., "VUzzer: Application-aware Evolutionary Fuzzing," Proceedings of the Network and Distributed System Security (NDSS) Symposium, pp. 1-14, Feb. 2016
9	Ganesh, Vijay, Tim Leek, and Martin Rinard, "Taint-based directed whitebox fuzzing," Proceedings of the 2009 IEEE International Conference on Software Engineering(ICSE), pp. 474-484, May. 2009
10	Chen, Peng, and Hao Chen, "Angora: Efficient fuzzing by principled search," Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), pp. 711-725, May. 2015
11	Wang, Song, Taiyue Liu, and Lin Tan, "Automatically learning semantic features for defect prediction," Proceedings of the 2016 IEEE/ACM International Conference on Software Engineering (ICSE), pp. 297-308, May. 2016
12	Wu, Fang, et al., "Vulnerability detection with deep learning," Proceedings of the 3rd IEEE international conference on computer and communications (ICCC). pp. 1298-1302, Dec. 2017
13	Zou, Deqing, Sujuan Wang, Shouhuai Xu, Zhen Li, and Hai Jin, "VulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection," IEEE Transactions on Dependable and Secure Computing, 18(5), pp. 2224-2236, Sep. 2019
14	Saccente, Nicholas, et al., "Project achilles: A prototype tool for static method-level vulnerability detection of Java source code using a recurrent neural network," Proceedings of the 2019 IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW), pp. 114-121, Nov. 2019
15	Gan Shuitao, et al., "Collafl: Path sensitive fuzzing," Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), pp. 679-696, May. 2018
16	Automatic feature learning for vulner ability prediction, "Automatic feature learning for vulnerability prediction," http://arxiv.org/abs/1708.02368, Dec. 2022
17	Li, Zhen, et al., "Sysevr: A framework for using deep learning to detect software vulnerabilities," IEEE Transactions on Dependable and Secure Computing, vol. 19, pp. 2244-2258, Aug. 2022 DOI
18	Lin, Guanjun, et al., "POSTER: Vulnerability discovery with function representation learning from unlabeled projects," Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 2539-2541, Oct. 2017
19	CWE Top 25 2021, "CWEtop25 2021" https://cwe.mitre.org/top25/archive/2021/2021_cwe_top25.html, Dec. 2022.
20	Joern, "joern" https://joern.io/, Dec. 2022
21	Xiao, Xusheng, and Shao Yang, "An image-inspired and cnn-based android malware detection approach," Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1259-1261, November. 2019
22	Li, Zhen, et al., "Vuldeepecker: A deep learning-based system for vulnerability detection," Proceedings of the 25th Network and Distributed System Security (NDSS) Symposium, pp. 1-15, Feb. 2018
23	Qiang, Gao, "Research on Software Vulnerability Detection Method Based on Improved CNN Model." Scientific Programming," vol. 2022, pp. 4442374, Jul. 2022
24	Wu, Fang, et al., "Vulnerability detection with deep learning," Proceedings of the 3rd IEEE International Conference on Computer and Communications (ICCC), pp. 1298-1302, Dec. 2017
25	BERT, "BERT" https://arxiv.org/abs/1810. 04805, Oct. 2018
26	Girshick Ross, "Fast r-cnn," Proceedings of the 2015 IEEE International Conference on Computer Vision, pp. 1440-1448, Dec. 2015
27	Mikolov Tomas, et al., "Recurrent neural network based language model," Jonornal of the Interspeech, vol. 2, no. 3, pp. 1045-1048, Sep. 2010 DOI
28	Vaswani Ashish, "Attention is all you need," Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS), vol. 30, pp. 1-11, Dec. 2017
29	CWE, "CWE" https://cwe.mitre.org/index. html, Dec. 2022
30	Cha, Sang Kil, Maverick Woo, and David Brumley, "Program-adaptive mutational fuzzing," Proceedings of the 2015 IEEE Symposium on Security and Privacy(S&P), pp. 725-741, May. 2017
31	Lin Guanjun, et al., "Software vulnerability detection using deep neural networks: a survey," Proceedings of the IEEE, vol.108, no. 10, pp. 1825-1848, Jun. 2020 DOI
32	Russell, Rebecca, et al., "Automated vulnerability detection in source code using deep representation learning," Proceedings of the 17th IEEE international conference on machine learning and applications (ICMLA), pp. 757-762. Dec. 2018
33	G. Lin et al., "Cross-project transfer representation learning for vulnerable function discovery," IEEE Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3289-3297, Jul. 2018 DOI
34	Grieco, Gustavo, et al., "Toward large-scale vulnerability discovery using machine learning," Proceedings of the 6th ACM Conference on Data and Application Security and Privacy, pp. 85-96, Mar. 2016

KSCI

A BERT-Based Deep Learning Approach for Vulnerability Detection BERT를 이용한 딥러닝 기반 소스코드 취약점 탐지 방법 연구

A BERT-Based Deep Learning Approach for Vulnerability Detection