Browse > Article
http://dx.doi.org/10.9708/jksci.2018.23.11.085

Detection of Malicious PDF based on Document Structure Features and Stream Objects  

Kang, Ah Reum (Dept. of Big Data Engineering, Soonchunhyang University)
Jeong, Young-Seob (Dept. of Big Data Engineering, Soonchunhyang University)
Kim, Se Lyeong (Korea Internet & Security Agency(KISA))
Kim, Jonghyun (Electronics and Telecommunication Research Institute (ETRI))
Woo, Jiyoung (Dept. of Big Data Engineering, Soonchunhyang University)
Choi, Sunoh (Electronics and Telecommunication Research Institute (ETRI))
Abstract
In recent years, there has been an increasing number of ways to distribute document-based malicious code using vulnerabilities in document files. Because document type malware is not an executable file itself, it is easy to bypass existing security programs, so research on a model to detect it is necessary. In this study, we extract main features from the document structure and the JavaScript contained in the stream object In addition, when JavaScript is inserted, keywords with high occurrence frequency in malicious code such as function name, reserved word and the readable string in the script are extracted. Then, we generate a machine learning model that can distinguish between normal and malicious. In order to make it difficult to bypass, we try to achieve good performance in a black box type algorithm. For an experiment, a large amount of documents compared to previous studies is analyzed. Experimental results show 98.9% detection rate from three different type algorithms. SVM, which is a black box type algorithm and makes obfuscation difficult, shows much higher performance than in previous studies.
Keywords
malware; PDF; machine learning; java script; detection;
Citations & Related Records
연도 인용수 순위
  • Reference
1 P. Laskov and N. Srndic, "Static Detection of Malicious JavaScript-Bearing PDF Documents," Proceedings of the Annual Computer Security Applications Conference (ACSAC), pp.373-382, 2011.
2 C. Smutz and A. Stavrou, "Malicious PDF Detection using Metadata and Structural Features," Proceedings of the 28th Annual Computer Security Applications Conference, pp.239-248, 2012.
3 N. Srndic and P. Laskov, "Detection of Malicious PDF Files Based on Hierarchical Document Structure," Proceedings of the 20th Annual Network & Distributed System Security Symposium, pp.1-16, 2013.
4 X. Lu, J. Zhuge, R. Wang, Y. Cao, and Y. Chen, "De-obfuscation and Detection of Malicious PDF Files with High Accuracy," Proceedings of the 46th Hawaii International Conference on System Sciences (HICSS), pp.4890-4899, 2013.
5 I. Corona, D. Maiorca, D. Ariu and G. Giacinto, "Lux0r: Detection of Malicious PDF-embedded Javascript Code through Discriminant Analysis of API References," Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pp.47-57, 2014.
6 N. Srndic and P. Laskov, "Hidost: a static machine-learn ing-based detector of malicious files," EURASIP Journal on Information Security, vol.2016, no.1, pp.22, 2016, 9.   DOI
7 M Li, Y Liu, M Yu, G Li, and Y Wang, "FEPDF: A Robust Feature Extractor for Malicious PDF Detection," Proceedings of BigDataSE/ICESS 2017, pp.218-224, 2017.
8 S. Khitan, A. Hadi and J. Atoum, "PDF Forensic Analysis System using YARA," International Journal of Computer Science and Network Security, vol.17, no.5, pp.77-85, 2017, 5.
9 B. Cuan, A. Damien, C. Delaplace, and M. Valois, "Malware Detection in PDF Files Using Machine Learning," SECRYPT 2018 - 15th International Conference on Security and Cryptography, pp.8, 2018, 7.
10 J. Zhang, "MLPdf: An Effective Machine Learning Based Approach for PDF Malware Detection," arXiv:1808.0699 1v1, 2018, 8.
11 J. Torres and S. D. L. Santos, "Malicious PDF Documents Detection using Machine Learning Techniques," Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018), pp.337-344, 2018.
12 D. Maiorca, G. Giacinto, and I. Corona, "A Pattern Recognition System for Malicious PDF Files Detection," Perner, P. (ed.) MLDM 2012, LNCS(LNAI), vol.7326, pp.510-524, 2012.
13 D. Liu, H. Wang, and A. Stavrou, "Detecting Malicious Javascript in PDF through Document Instrumentation," Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2014, 6.