[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7472/jksii.2021.22.2.1

A Study on Classification of Variant Malware Family Based on ResNet-Variational AutoEncoder

Lee, Young-jeon (Department of Software, Gachon University)
Han, Myung-Mook (Department of Software, Gachon University)

Publication Information

Journal of Internet Computing and Services / v.22, no.2, 2021 , pp. 1-9 More about this Journal

Abstract

Traditionally, most malicious codes have been analyzed using feature information extracted by domain experts. However, this feature-based analysis method depends on the analyst's capabilities and has limitations in detecting variant malicious codes that have modified existing malicious codes. In this study, we propose a ResNet-Variational AutoEncder-based variant malware classification method that can classify a family of variant malware without domain expert intervention. The Variational AutoEncoder network has the characteristics of creating new data within a normal distribution and understanding the characteristics of the data well in the learning process of training data provided as input values. In this study, important features of malicious code could be extracted by extracting latent variables in the learning process of Variational AutoEncoder. In addition, transfer learning was performed to better learn the characteristics of the training data and increase the efficiency of learning. The learning parameters of the ResNet-152 model pre-trained with the ImageNet Dataset were transferred to the learning parameters of the Encoder Network. The ResNet-Variational AutoEncoder that performed transfer learning showed higher performance than the existing Variational AutoEncoder and provided learning efficiency. Meanwhile, an ensemble model, Stacking Classifier, was used as a method for classifying variant malicious codes. As a result of learning the Stacking Classifier based on the characteristic data of the variant malware extracted by the Encoder Network of the ResNet-VAE model, an accuracy of 98.66% and an F1-Score of 98.68 were obtained.

Keywords

Variant Malware; Malware Classification; Variational AutoEncoder; Tranfer Learning; Ensemble Learning;

Citations & Related Records

Reference

1	Kingma, Diederik P, and Max Welling, "Auto-encoding variational bayes", arXiv preprint arXiv:1312.6114, 2013. https://doi.org/10.18653/v1/2020.coling-main.458
2	Nataraj, Lakshmanan et al., "Malware images: visualization and automatic classification", Proceedings of the 8th international symposium on visualization for cyber security, 2011. https://doi.org/10.1145/2016904.2016908 DOI
3	Nataraj and B. S. Manjunath, "SPAM: Signal processing to analyze malware", arXiv, 2016. https://doi.org/10.1109/msp.2015.2507185
4	Chalapathy, Raghavendra, and Sanjay Chawla, "Deep learning for anomaly detection: A survey", arXiv preprint arXiv:1901.03407, 2019. https://arxiv.org/abs/1901.03407
5	Jin-Young Kim, Seok-Jun Bu, and Sung-Bae Cho, "Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders", Information Sciences 460, pp83-102, 2018. https://doi.org/10.1016/j.ins.2018.04.092 DOI
6	Jin-Young Kim, and Sung-Bae Cho, "Detecting intrusive malware with a hybrid generative deep learning model.", International Conference on Intelligent Data Engineering and Automated Learning. Springer, 2018. https://doi.org/10.1007/978-3-030-03493-1_52 DOI
7	Weiss, Karl, Taghi M. Khoshgoftaar, and DingDing Wang, "A survey of transfer learning.", Journal of Big data, 2016. https://doi.org/10.1186/s40537-016-0043-6 DOI
8	Bisong, Ekaba, "Google colaboratory", Building Machine Learning and Deep Learning Models on Google Cloud Platform. Apress, Berkeley, CA, 2019. https://doi.org/10.1007/978-1-4842-4470-8_7
9	Luo, Jhu-Sin, and Dan Chia-Tien Lo, "Binary malware image classification using machine learning with local binary pattern", IEEE International Conference on Big Data, IEEE, 2017. https://doi.org/10.1109/bigdata.2017.8258512 DOI
10	Gribbon, Kim T, and Donald G. Bailey, "A novel approach to real-time bilinear interpolation", Proceedings. DELTA 2004. Second IEEE International Workshop on Electronic Design, Test and Applications. IEEE, 2004. https://doi.org/10.1109/delta.2004.10055 DOI
11	Venkatraman, Sitalakshmi, Mamoun Alazab, and R. Vinayakumar, "A hybrid deep learning image-based analysis for effective malware detection", Journal of Information Security and Applications 47, 2019. https://doi.org/10.1016/j.jisa.2019.06.006 DOI
12	Moser, Andreas, Christopher Kruegel, and Engin Kirda, "Limits of static analysis for malware detection", Twenty-Third Annual Computer Security Applicaitons Conference IEEE, 2007. https://doi.org/10.1109/acsac.2007.21 DOI
13	Zhou, Xin, Jianmin Pang, and Guanghui Liang. "Image classification for malware detection using extremely randomized trees." 2017 11th IEEE International Conference on Anti-counterfeiting, Security, and Identification (ASID). IEEE, 2017. https://doi.org/10.1109/icasid.2017.8285743 DOI
14	Vasan, Danish, et al. "IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture." Computer Networks 171, 2020. https://doi.org/10.1016/j.comnet.2020.107138 DOI
15	He, Kaiming, et al., "Deep residual learning for image recognition", Proceedings of the IEEE conference on computer vision and pattern recognition, 2016. https://doi.org/10.1109/cvpr.2016.90 DOI

KSCI

A Study on Classification of Variant Malware Family Based on ResNet-Variational AutoEncoder ResNet-Variational AutoEncoder기반 변종 악성코드 패밀리 분류 연구

A Study on Classification of Variant Malware Family Based on ResNet-Variational AutoEncoder