[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.13089/JKIISC.2022.32.2.417

A Fuzzing Seed Generation Technique Using Natural Language Processing Model

Kim, DongYonug (School of Cybersecurity, Korea University)
Jeon, SangHoon (School of Cybersecurity, Korea University)
Ryu, MinSoo (School of Cybersecurity, Korea University)
Kim, Huy Kang (School of Cybersecurity, Korea University)

Publication Information

Journal of the Korea Institute of Information Security & Cryptology / v.32, no.2, 2022 , pp. 417-437 More about this Journal

Abstract

The quality of the fuzzing seed file is one of the important factors to discover vulnerabilities faster. Although the prior seed generation paradigm, using dynamic taint analysis and symbolic execution techniques, enhanced fuzzing efficiency, the yare not extensively applied owing to their high complexity and need for expertise. This study proposed the DDRFuzz system, which creates seed files based on sequence-to-sequence models. We evaluated DDRFuzz on five open-source applications that used multimedia input files. Following experimental results, DDRFuzz showed the best performance compared with the state-of-the-art studies in terms of fuzzing efficiency.

Keywords

Fuzzing; Seed generation; Sequence-to-Sequence;

Citations & Related Records

Reference

1	C. Lyu, S. Ji, Y. Li, J. Zhou, J. Chen, and J. Chen, "Smartseed: Smart seed generation for efficient fuzzing", arXiv, preprint arXiv:1807.02606, Jun. 2019.
2	K. Bottinger, P. Godefroid, and R. Singh, "Deep reinforcement fuzzing", IEEE Security and Privacy Workshops(SPW), pp. 116-122, Oct. 2018.
3	L. Cheng, Y. Zhang, Y. Zhang, C. Wu, Z. Li, Y. Fu, and H. Li, "Optimizing seed inputs in fuzzing with machine learning", 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 244-245, Aug. 2019.
4	libFuzzer, "https://llvm.org/docs/LibFuzzer.html" accessed: Nov. 2020.
5	NVD, "CVE." https://cve.mitre.org/cve/, Accessed: Nov 2020.
6	A. Rebert, S. K. Cha, T. Avgerinos, J. Foote, D. Warren, G. Grieco, and D. Brumley, "Optimizing seed selection for fuzzing", Proceedings of the 23rd USENIX Security Symposium, pp. 861-875, Aug. 2014.
7	V. J. M. Manes, H. Han, C. Han, S.K. Cha, M. Eglele, E. J. Schwartz, and M. Woo, "The art, science, and engineering of fuzzing: A survey", IEEE Transaction on Software Engineering, pp. 2312-2331, Nov. 2021.
8	M. Rajpal, W. Blum, and R. Singh, "Not all bytes are equal: Neural bytesieve for fuzzing", arXiv, preprint arXiv:1711.04596, Nov. 2017.
9	W. Drozd and M. D. Wagner, "Fuzzergym: A competitive framework for fuzzing and learning", arXiv, preprint arXiv:1807.07490, Jul. 2018.
10	X. Liu, R. Prajapati, X. Li, and D. Wu, "Reinforcement compiler fuzzing", arXiv, preprint arXiv:1801.04589, Jan. 2018.
11	L. Joffe, "Machine learning augmented fuzzing", 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 178-183, Oct. 2018.
12	K. Fang and G. Yan, "Emulation-instrumented fuzz testing of 4g/lte android mobile devices guided by reinforcement learning", European Symposium on Research in Computer Security, pp. 20-40, Sep. 2018.
13	S. Karamcheti, G. Mann, and D. Rosenberg, "Adaptive grey-boxfuzz-testing with thompson sampling", Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, pp. 37-47, Oct. 2018.
14	J. Wang, B. Chen, L. Wei, and Y. Liu, "Skyfire: Data-driven seed generation for fuzzing", 2017 IEEE Symposium on Security and Privacy (SP), pp. 579-594, Jun. 2017.
15	Z. Hu, J. Shi, Y. Huang, J. Xiong, and X. Bu, "Ganfuzz: a gan-based industrial network protocol fuzzing framework", Proceedings of the 15th ACM International Conference on Computing Frontiers, pp. 138-145, May. 2018.
16	Z. Li, H. Zhao, J. Shi, Y. Huang, and J. Xiong, "An intelligent fuzzing data generation method based on deep adversarial learning," IEEE Access, vol. 7, pp. 49327-49340, Apr. 2019. DOI
17	R. Fan and Y. Chang, "Machine learning for black-box fuzzing of network protocols", International Conference on Information and Communications Security, pp. 621-632, Apr. 2018.
18	Z. Zhang, B. Cui, and C. Chen, "Reinforcement learning-based fuzzing technology", International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 244-253, Jun. 2020.
19	Y. Wang, Z. Wu, Q. Wei, and Q. Wang, "Neufuzz: Efficient fuzzing with deep neural network," IEEE Access, vol. 7, pp. 36340-36352, Mar. 2019. DOI
20	P. Godefroid, H. Peleg, and R. Singh, "Learn&fuzz: Machine learning for input fuzzing", Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, pp. 50-59, Oct. 2017.
21	N. Nichols, M. Raugas, R. Jasper, and N. Hilliard, "Faster fuzzing: Reinitialization with deep neural models", arXiv, preprint arXiv:1711.02807, Nov. 2017.
22	S. Rawat, V. Jain, A. Kumar, L. Cojocar, C. Giuffrida, and H. Bos, "Vuzzer: Application-aware evolutionary fuzzing", NDSS, vol. 17, pp. 1-14, Feb. 2017.
23	H. Peng, Y. Shoshitaishvili, and M. Payer, "T-fuzz: fuzzing by program transformation", 2018 IEEE Symposium on Security and Privacy(SP), pp. 697-710, May. 2018.
24	AFLi, "American Fuzzy Loop." https://lcamtuf.coredump.cx/afl/, Accessed: Nov 2020.
25	M. Bohme, V.-T. Pham, and A. Roychoudhury, "Coverage-based greybox fuzzing as markov chain", Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 1032-1043, Oct. 2016.

KSCI

A Fuzzing Seed Generation Technique Using Natural Language Processing Model 자연어 처리 모델을 활용한 퍼징 시드 생성 기법

A Fuzzing Seed Generation Technique Using Natural Language Processing Model