Browse > Article
http://dx.doi.org/10.13089/JKIISC.2022.32.2.439

MS Office Malicious Document Detection Based on CNN  

Park, Hyun-su (Pai Chai University)
Kang, Ah Reum (Pai Chai University)
Abstract
Document-type malicious codes are being actively distributed using attachments on websites or e-mails. Document-type malicious code is relatively easy to bypass security programs because the executable file is not executed directly. Therefore, document-type malicious code should be detected and prevented in advance. To detect document-type malicious code, we identified the document structure and selected keywords suspected of being malicious. We then created a dataset by converting the stream data in the document to ASCII code values. We specified the location of malicious keywords in the document stream data, and classified the stream as malicious by recognizing the adjacent information of the malicious keywords. As a result of detecting malicious codes by applying the CNN model, we derived accuracies of 0.97 and 0.92 in stream units and file units, respectively.
Keywords
MS Office; malicious; detection; CNN; deep learning;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 "Malicious documents disguised as reimbursement requests," AhnLab, Mar, 2021. pp.1
2 Young-Seob Jeong, Jiyoung Woo, SangMin Lee and Ah Reum Kang, "Malware Detection of Hangul Word Processor Files Using Spatial Pyramid Average Pooling," Sensors, 20(18), pp. 5265, Sep. 2020.   DOI
3 Young-Seob Jeong, Jiyoung Woo and Ah Reum Kang, "Malware detection on byte streams of pdf files using convolutional neural networks," Security and Communication Networks, 2019. Apr. 2019.
4 Young-Seob Jeong, Jiyoung Woo and Ah Reum Kang, "Malware Detection on Byte Streams of Hangul Word Processor Files," Applied Sciences, 9(23), pp. 5178, Jan. 2019.   DOI
5 Chae-Eun Yoon, Hey-hyeon Jeoung and Chang-Jin Seo, "Detection for Document-Type Malware Code using Deep Learning Model and PDF Object Analysis," The Koran Institute of Electrical Engineers, pp.44-49, Mar. 2021.
6 Dekkyu Lee and Sangjin Lee, "A Study of Office Open XML Document-Based Malicious Code Analysis and Detection Methods," Journal of the Korea Institute of Information Security & Cryptology, pp. 429-442, Jun. 2020.
7 Sung Hye Cho and Sang Jin Lee, "A Research of Anomaly Detection Method in MS Office Document," The Korea Information Processing Socirety, 6(2), pp.87-94, 2017.
8 Ah Reum Kang, Young-Seob Jeong, Se Lyeong Kim, Jonghyun Kim, Jiyoung Woo and Sunoh Choi, "Detection of malicious pdf based on document structure features and stream objects," The Korea Society of Computer and Information, 23(11), pp. 85-93, Nov. 2018.
9 MinJi Choe, KangSik Shin and DongJae Jung, "HWP Format Vulnerability Analysis For Document-Type Malware Detection," The Korean Institute of Information Scientists and Engineers, pp.1188-1190, Jun. 2020.
10 "Malware that exploits HWP files has been found," Boannews, Aug. 2018. pp.1
11 "RTF malware disguised as a cover letter of a specific airline," AhnLab, Oct. 2021. pp.1
12 Wikipedia, "Com Structured Storage," https://en.wikipedia.org/wiki/COM_Structured_Storage, Jan. 2022.
13 Ah Reum Kang, Young-Seob Jeong, Se Lyeong Kim and Jiyoung Woo, "Malicious PDF Detection Model against Adversarial Attack Built from Bengin PDF Containing JavaScript," Applied Sciences, 9(22), pp. 4764, Nov. 2019.   DOI