Browse > Article
http://dx.doi.org/10.13089/JKIISC.2022.32.6.1139

A BERT-Based Deep Learning Approach for Vulnerability Detection  

Jin, Wenhui (Department of Computer Science and Engineering, Hanyang University)
Oh, Heekuck (Department of Computer Science and Engineering, Hanyang University)
Abstract
With the rapid development of SW Industry, softwares are everywhere in our daily life. The number of vulnerabilities are also increasing with a large amount of newly developed code. Vulnerabilities can be exploited by hackers, resulting the disclosure of privacy and threats to the safety of property and life. In particular, since the large numbers of increasing code, manually analyzed by expert is not enough anymore. Machine learning has shown high performance in object identification or classification task. Vulnerability detection is also suitable for machine learning, as a reuslt, many studies tried to use RNN-based model to detect vulnerability. However, the RNN model is also has limitation that as the code is longer, the earlier can not be learned well. In this paper, we proposed a novel method which applied BERT to detect vulnerability. The accuracy was 97.5%, which increased by 1.5%, and the efficiency also increased by 69% than Vuldeepecker.
Keywords
Deep Learning; Vulnerability Detection; Source Code; BERT; Program Slicing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Weiser, Mark. "Program slicing," IEEE Transactions on Software Engineering, vol. 4, pp. 352-357, Jul. 1984   DOI
2 Shar, Lwin Khin, and Hee Beng Kuan Tan, "Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns," Information and Software Technology, vol. 55, no. 10, pp. 1767-1780, Oct. 2013   DOI
3 Fotify, "fotify" https://www.microfocus.com/en-us/cyberres/application-security/static-code-analyzer, Dec. 2022
4 Checkmarx, "Checkmarx" https://checkma rx.com/, Dec. 2022
5 AFL, "AFL" https://github.com/google/AFL, Dec. 2022
6 Fan, Ming, et al., "Text backdoor detection using an interpretable rnn abstract model," Jonornal of the IEEE Transactions on Information Forensics and Security, 16(0), pp. 4117-4132, Aug. 2021   DOI
7 Bohme Marcel, et al., "Directed greybox fuzzing," Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 2329-2344, Oct. 2017
8 Rawat Sanjay, et al., "VUzzer: Application-aware Evolutionary Fuzzing," Proceedings of the Network and Distributed System Security (NDSS) Symposium, pp. 1-14, Feb. 2016
9 Ganesh, Vijay, Tim Leek, and Martin Rinard, "Taint-based directed whitebox fuzzing," Proceedings of the 2009 IEEE International Conference on Software Engineering(ICSE), pp. 474-484, May. 2009
10 Chen, Peng, and Hao Chen, "Angora: Efficient fuzzing by principled search," Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), pp. 711-725, May. 2015
11 Wang, Song, Taiyue Liu, and Lin Tan, "Automatically learning semantic features for defect prediction," Proceedings of the 2016 IEEE/ACM International Conference on Software Engineering (ICSE), pp. 297-308, May. 2016
12 Wu, Fang, et al., "Vulnerability detection with deep learning," Proceedings of the 3rd IEEE international conference on computer and communications (ICCC). pp. 1298-1302, Dec. 2017
13 Zou, Deqing, Sujuan Wang, Shouhuai Xu, Zhen Li, and Hai Jin, "VulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection," IEEE Transactions on Dependable and Secure Computing, 18(5), pp. 2224-2236, Sep. 2019
14 Saccente, Nicholas, et al., "Project achilles: A prototype tool for static method-level vulnerability detection of Java source code using a recurrent neural network," Proceedings of the 2019 IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW), pp. 114-121, Nov. 2019
15 Gan Shuitao, et al., "Collafl: Path sensitive fuzzing," Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), pp. 679-696, May. 2018
16 Automatic feature learning for vulner ability prediction, "Automatic feature learning for vulnerability prediction," http://arxiv.org/abs/1708.02368, Dec. 2022
17 Li, Zhen, et al., "Sysevr: A framework for using deep learning to detect software vulnerabilities," IEEE Transactions on Dependable and Secure Computing, vol. 19, pp. 2244-2258, Aug. 2022   DOI
18 Lin, Guanjun, et al., "POSTER: Vulnerability discovery with function representation learning from unlabeled projects," Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 2539-2541, Oct. 2017
19 CWE Top 25 2021, "CWEtop25 2021" https://cwe.mitre.org/top25/archive/2021/2021_cwe_top25.html, Dec. 2022.
20 Joern, "joern" https://joern.io/, Dec. 2022
21 Xiao, Xusheng, and Shao Yang, "An image-inspired and cnn-based android malware detection approach," Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1259-1261, November. 2019
22 Li, Zhen, et al., "Vuldeepecker: A deep learning-based system for vulnerability detection," Proceedings of the 25th Network and Distributed System Security (NDSS) Symposium, pp. 1-15, Feb. 2018
23 Qiang, Gao, "Research on Software Vulnerability Detection Method Based on Improved CNN Model." Scientific Programming," vol. 2022, pp. 4442374, Jul. 2022
24 Wu, Fang, et al., "Vulnerability detection with deep learning," Proceedings of the 3rd IEEE International Conference on Computer and Communications (ICCC), pp. 1298-1302, Dec. 2017
25 BERT, "BERT" https://arxiv.org/abs/1810. 04805, Oct. 2018
26 Girshick Ross, "Fast r-cnn," Proceedings of the 2015 IEEE International Conference on Computer Vision, pp. 1440-1448, Dec. 2015
27 Mikolov Tomas, et al., "Recurrent neural network based language model," Jonornal of the Interspeech, vol. 2, no. 3, pp. 1045-1048, Sep. 2010   DOI
28 Vaswani Ashish, "Attention is all you need," Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS), vol. 30, pp. 1-11, Dec. 2017
29 CWE, "CWE" https://cwe.mitre.org/index. html, Dec. 2022
30 Cha, Sang Kil, Maverick Woo, and David Brumley, "Program-adaptive mutational fuzzing," Proceedings of the 2015 IEEE Symposium on Security and Privacy(S&P), pp. 725-741, May. 2017
31 Lin Guanjun, et al., "Software vulnerability detection using deep neural networks: a survey," Proceedings of the IEEE, vol.108, no. 10, pp. 1825-1848, Jun. 2020   DOI
32 Russell, Rebecca, et al., "Automated vulnerability detection in source code using deep representation learning," Proceedings of the 17th IEEE international conference on machine learning and applications (ICMLA), pp. 757-762. Dec. 2018
33 G. Lin et al., "Cross-project transfer representation learning for vulnerable function discovery," IEEE Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3289-3297, Jul. 2018   DOI
34 Grieco, Gustavo, et al., "Toward large-scale vulnerability discovery using machine learning," Proceedings of the 6th ACM Conference on Data and Application Security and Privacy, pp. 85-96, Mar. 2016