Browse > Article
http://dx.doi.org/10.13089/JKIISC.2020.30.4.593

Stacked Autoencoder Based Malware Feature Refinement Technology Research  

Kim, Hong-bi (Hoseo University)
Lee, Tae-jin (Hoseo University)
Abstract
The advent of malicious code has increased exponentially due to the spread of malicious code generation tools in accordance with the development of the network, but there is a limit to the response through existing malicious code detection methods. According to this situation, a machine learning-based malicious code detection method is evolving, and in this paper, the feature of data is extracted from the PE header for machine-learning-based malicious code detection, and then it is used to automate the malware through autoencoder. Research on how to extract the indicated features and feature importance. In this paper, 549 features composed of information such as DLL/API that can be identified from PE files that are commonly used in malware analysis are extracted, and autoencoder is used through the extracted features to improve the performance of malware detection in machine learning. It was proved to be successful in providing excellent accuracy and reducing the processing time by 2 times by effectively extracting the features of the data by compressively storing the data. The test results have been shown to be useful for classifying malware groups, and in the future, a classifier such as SVM will be introduced to continue research for more accurate malware detection.
Keywords
autoEncoder; feature importance; malware; TF-IDF; variant;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio and P.A. Manzagol, "Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion," The Journal of Machine Learning Research, vol. 11, no. 110, pp. 3371-3408, Dec. 2010.
2 Ji-hee Ha, Su-jeong Kim and Tae-jin Lee, "Feature Extraction using DLL/API Statistical Analysis and Malware Detection based on Machine Learning," The Journal of Korean Institute of Communications and Information Sciences, 43(4), pp. 730-739, Apr. 2018.   DOI
3 Hee-jun Kwon, Sun-woo Kim and Eul-gyu Im, "An Malware Classification System using Multi N-gram," Journal of Security Engineering, 9(6), pp. 531-542, Dec. 2012.
4 Medium, "ExtraTreesClassifier", https://medium.com/@namanbhandari/extratreesclassifier-8e7fc0502c7, Oct. 2018
5 Kyung-lyul Hyun and Je-hee Lee, "Motion Data Analysis with Autoencoder," Korea Computer Graphics Society Conference, pp. 133-134, Jun. 2013.
6 Symantec, "Symantec internet security threat report." ISTR-23-2018. Mar. 2018.
7 Jae-woo Parck, Sung-tae Moon, Gi-wook Son, In-kyoung Kim, Kyoung-soo Han, Eul-gyu Im and Il-gon Kim, "An Automatic Malware Classification System using String List and APIs," Journal of Security Engineering, 8(5), pp. 611-626, Oct. 2011.
8 Lenny Zeltser, "Mastering 4 Stages of Malware Analysis," https://zeltser.com/mastering-4-stages-of-malware-analysis/, Feb. 2015.
9 Kyung-min Kim, "Malware analysis method, " https://brunch.co.kr/@kali-km/5, Jun. 2016.
10 Zhi-guo Chen, "A Learning-based Static Malware Detection System with Integrated Feature Selection," Ph.D. Graduate School of Konkuk University, Feb. 2019.
11 Ji-hee Ha and Tae-jin Lee, "TF-IDF based PE File Embedding and Malware Classification," Journal of the Korean Institute of Information Security and Cryptology Summer Conference, pp. 11-14, Jun. 2019.
12 H.C. Tanuwidjaja and Kwang-jo Kim, "Enhancing Malware Detection by Modified Deep Abstraction and Weighted Feature Selection," 2020 Symposium on Cryptography and Information Security, pp. 1-8, Jan. 2020.
13 Seong-bin Park, Min-soo Kim and Bont-nam Not, "Detecion Method Using Common Feature of Malware Variants Generated by Automated Tools," Journal of Korean institute of information technology, 10(9), pp. 67-75, Sep. 2012.
14 A.G. Kakisim, M. Nar, N. Carkaci and I. Sogukpinar, "Analysis and Evaluation of Dynamic Feature-Based Malware Detection Methods," 11th International Conference Sec ITC, pp. 247-258, Nov. 2018.
15 T. Shibahara, T. Yagi, M. Akiyama, D. Chiba and T. Yada "Efficient Dynamic Malware Analysis Based on Network Behavior Using Deep Learning," 2016 IEEE Global Communications Conference (GLOBECOM), pp. 1-7, Dec. 2016.
16 A. Geron, Hands-On Machine Learning with Scikit-Learn & TensorFlow, Hanbit Media, Apr. 2018.
17 O. David and N. S.Netanyahu, "DeepSign: Deep Learning for Automatic Malware Signature Generation and Classification," International Joint Conference on Neural Networks (IJCNN), pp. 1-8, July. 2015.
18 Dong-geun Kwon, Sunghyun Jin, Hee-seok Kim and Seok-hie Hong, "Improving Non-Profiled Side-Channel Analysis Using Auto-Encoder Based Noise Reduction Preprocessing," Journal of The Korea Institute of Information Security and Cryptology, 29(3), pp. 491-501, Jun. 2019.   DOI