Browse > Article
http://dx.doi.org/10.9717/kmms.2020.23.5.650

Design and Implementation of a Pre-processing Method for Image-based Deep Learning of Malware  

Park, Jihyeon (Dept. of Information Security, Seoul Women's University)
Kim, Taeok (Dept. of Information Security, Seoul Women's University)
Shin, Yulim (Dept. of Information Security, Seoul Women's University)
Kim, Jiyeon (Center for Software Educational Innovation and Right AI with Security & Ethics Research Center, Seoul Women's University)
Choi, Eunjung (Dept. of Information Security and Right AI with Security & Ethics Research Center, Seoul Women's University)
Publication Information
Abstract
The rapid growth of internet users and faster network speed are driving the new ICT services. ICT Technology has improved our way of thinking and style of life, but it has created security problems such as malware, ransomware, and so on. Therefore, we should research against the increase of malware and the emergence of malicious code. For this, it is necessary to accurately and quickly detect and classify malware family. In this paper, we analyzed and classified visualization technology, which is a preprocessing technology used for deep learning-based malware classification. The first method is to convert each byte into one pixel of the image to produce a grayscale image. The second method is to convert 2bytes of the binary to create a pair of coordinates. The third method is the method using LSH. We proposed improving the technique of using the entire existing malicious code file for visualization, extracting only the areas where important information is expected to exist and then visualizing it. As a result of experimenting in the method we proposed, it shows that selecting and visualizing important information and then classifying it, rather than containing all the information in malicious code, can produce better learning results.
Keywords
Deep Learning; Visualization; Data pre-processing; Malware; Classification;
Citations & Related Records
Times Cited By KSCI : 12  (Citation Analysis)
연도 인용수 순위
1 McAfee Labs Threats Report, https://www.mcafee.com/enterprise/en-us/assets/reports/rp-quarterly-threats-aug-2019.pdf. (accessed January 6, 2020)
2 J. Kim, S. Hong, H. Kim, "A StyleGAN Image Detection Model Based on Convolutional Neural Network," Journal of Korea Multimedia Society, Vol. 22, No. 12, pp. 1447-1456, 2019   DOI
3 T. Kim, H. Ji, and E. Im, “Malware Classification Using Machine Learning and Binary Visualization,” Korean Institute of Information Scientists Engineers Transactions on Compution Practices, Vol. 24, No. 4, pp. 198-203, 2018.
4 K. Han, B. Kang, and E. Im, "Malware Analysis Using Visualized Image Matrices," The Scientific World Journal, Vol. 2014, Article ID. 132713, 2014.
5 Microsoft, Microsoft Malware Classification Challenge, https://www.kaggle.com/c/malwareclassification (accessed November 28, 2019).
6 S. Kang, N.V. Long, and S. Jung, “Android Malware Detection Using Permission-based Machine Learning Approach,” Journal of the Korea Institute of Information Security and Cryptology, Vol. 28, No. 3, pp. 617-623, 2018.   DOI
7 D. Jo and D. Park, “Real-time Malware Detection Method Using Machine Learning,” The Journal of Korean Institute of Information Technology, Vol. 16, No. 3, pp. 101-113, 2018.   DOI
8 W. Huang and J.W. Stokes, "MtNet: A Multi-Task Neural Network for Dynamic Malware Classification," Proceedings of Detection of Intrusions and Malware, and Vulnerability Assessment, Vol. 9721, pp. 399-418, 2016.
9 J. Bae, C. Lee, S. Choi, and J. Kim, “Malware Detection Model with Skip-connected LSTM RNN,” The Korean Institute of Information Scientists and Engineers, Vol. 45, No. 12, pp. 1233-1239, 2018.
10 L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, "Malware Images: Visualization and Automatic Classification," Proceedings of the International Symposium on Visualization for Cyber Security, pp. 1-7, 2011.
11 S. Jeong, H. Kim, Y. Kim, and M. Yoon, “Vgram: Malware Detection Using Opcode Basic Blocks and Deep Learning,” Journal of Korean Institute of Information Scientists and Engineers, Vol. 46, No. 7, pp. 599-605, 2019.
12 M.S. Charikar, "Similarity Estimation Techniques from Rounding Algorithms," Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, pp. 380-388, 2002.
13 H. Kim, S. Han, S. Lee, and J. Lee, “Visualization of Malwares for Classification through Deep Learning,” Journal of Internet Computing and Services, Vol. 19, No. 5, pp. 67-75, 2018.   DOI
14 Anubis: Analyzing Unknown Binaries, https://www.virusbulletin.com/conference/vb 2009/abstracts/anubis-analyzing-unknown-binariesautomatic-way (accessed January 10, 2020).
15 H. Seo, J. Choi, and P. Chu, “A Study on Windows Malicious Code Classification System,” Journal of the Korea Society for Simulation, Vol. 18, No. 1, pp. 63-70, 2009.
16 K. Han, J. Lim, and E. Im, "Malware Analysis Method Using Visualization of Binary Files," Proceedings of the Research in Adaptive and Convergent Systems, pp. 317-321, 2013.
17 S. Seok and H. Kim, “Visualized Malware Classification Based-on Convolutional Neural Network,” Journal of the Korea Institute of Information Security and Cryptology, Vol. 26, No. 1, pp. 197-208, 2016.   DOI
18 J. Fu, J. Xue, Y. Wang, Z. Liu, and C. Shan, "Malware Visualization for Fine-grained Classification," IEEE Access, Vol. 6, pp. 14510-14523, 2018.   DOI
19 Y. Jeon, J. Oh, I. Kim, and J. Jang, “A Study on Internet Malware Classification Method and Detection Mechanism,” Korea Institute of Information Security and Cryptology Review, Vol. 18, No. 3, pp. 60-73, 2008.
20 E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. Nicholas, "Malware Detection by Eating a Whole EXE," Proceeding of American Association for Artificial Intelligence Workshop on AI for Cyber Security, pp. 268-276, 2018.
21 S. Ni, Q. Qian, and R. Zhang, "Malware Identification Using Visualization Images and Deep Learning," Computers and Security, Vol. 77, pp. 871-885, 2018.   DOI