Visualization of Malwares for Classification Through Deep Learning

Kim, Hyeonggyeom;Han, Seokmin;Lee, Suchul;Lee, Jun-Rak;

doi:10.7472/jksii.2018.19.5.67

Journal of Internet Computing and Services (인터넷정보학회논문지)

Volume 19 Issue 5
/
Pages.67-75
/
2018
/
1598-0170(pISSN)
/
2287-1136(eISSN)

Korean Society for Internet Information (한국인터넷정보학회)

DOI QR Code

Visualization of Malwares for Classification Through Deep Learning

딥러닝 기술을 활용한 멀웨어 분류를 위한 이미지화 기법

Kim, Hyeonggyeom (Dept. of Computer Science and Information Engineering, Korea National University of Transportation) ;
Han, Seokmin (Dept. of Computer Science and Information Engineering, Korea National University of Transportation) ;
Lee, Suchul (Dept. of Computer Science and Information Engineering, Korea National University of Transportation) ;
Lee, Jun-Rak (Dept. of Humanities and Social Sciences, Kangwon National University)

Received : 2018.07.17
Accepted : 2018.08.16
Published : 2018.10.31

https://doi.org/10.7472/jksii.2018.19.5.67 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

According to Symantec's Internet Security Threat Report(2018), Internet security threats such as Cryptojackings, Ransomwares, and Mobile malwares are rapidly increasing and diversifying. It means that detection of malwares requires not only the detection accuracy but also versatility. In the past, malware detection technology focused on qualitative performance due to the problems such as encryption and obfuscation. However, nowadays, considering the diversity of malware, versatility is required in detecting various malwares. Additionally the optimization is required in terms of computing power for detecting malware. In this paper, we present Stream Order(SO)-CNN and Incremental Coordinate(IC)-CNN, which are malware detection schemes using CNN(Convolutional Neural Network) that effectively detect intelligent and diversified malwares. The proposed methods visualize each malware binary file onto a fixed sized image. The visualized malware binaries are learned through GoogLeNet to form a deep learning model. Our model detects and classifies malwares. The proposed method reveals better performance than the conventional method.

Symantec의 인터넷 보안위협 보고서(2018)에 따르면 크립토재킹, 랜섬웨어, 모바일 등 인터넷 보안위협이 급증하고 있으며 다각화되고 있다고 한다. 이는 멀웨어(Malware) 탐지기술이 암호화, 난독화 등의 문제에 따른 질적 성능향상 뿐만 아니라 다양한 멀웨어의 탐지 등 범용성을 요구함을 의미한다. 멀웨어 탐지에 있어 범용성을 달성하기 위해서는 탐지알고리즘에 소모되는 컴퓨팅 파워, 탐지 알고리즘의 성능 등의 측면에서의 개선 및 최적화가 이루어져야 한다. 본고에서는 최근 지능화, 다각화 되는 멀웨어를 효과적으로 탐지하기 위하여 CNN(Convolutional Neural Network)을 활용한 멀웨어 탐지 기법인, stream order(SO)-CNN과 incremental coordinate(IC)-CNN을 제안한다. 제안기법은 멀웨어 바이너리 파일들을 이미지화 한다. 이미지화 된 멀웨어 바이너리는 GoogLeNet을 통해 학습되어 딥러닝 모델을 형성하고 악성코드를 탐지 및 분류한다. 제안기법은 기존 방법에 비해 우수한 성능을 보인다.

Keywords

References

"Innovation, organisation, and sophistication-these are the tools of cyber attackers as they work harder and more efficiently to uncover new vulnerabilities", Symantec Internet Security Threat Report, 2018. https://resource.elq.symantec.com/LP=5840?cid=70138000000rm1eAAA
Nataraj L, Karthikeyan S, Jacob G, Manjunath B. S., "Malware images: visualization and automatic classification", In proc. of the 8th ACM international symposium on visualization for cyber security 2011. http://doi.org/10.1145/2016904.2016908
Ji H., and Im E., "Malware Classification Using Machine Learning and Binary Visualization", the Korea Computer Congress. KCC, pp.1084-1086, 2017. http://dx.doi.org/10.5626/KTCP.2018.24.4.198
Schultz MG, Eskin E, Zadok F, Stolfo SJ, "Data mining methods for detection of new malicious executables", In IEEE symposium on security and privacy(S&P '01), 2001. https://doi.org/10.1109/SECPRI.2001.924286
Cohen W. W., "Fast effective rule induction", In Proceedings of the Twelfth International Conference on Machine Learning, 1995. https://doi.org/10.1016/B978-1-55860-377-6.50023-2
Kong D. and Yan G., "Discriminant malware distance learning on structural information for automated malware classification", In ACM SIGKDD 2013. http://dx.doi.org/10.1145/2487575.2488219
Li Q. and Li X., "Android malware detection based on static analysis of characteristic tree", In international conference on cyber-enabled distributed computing and knowledge discovery (cyberc), 2015. https://doi.org/10.1109/CyberC.2015.88
Santos I., Brezo F., Ugarte-Pedrero X., Bringas P. G., "Opcode sequences as representation of executables for data-mining-based unknown malware detection", Elsevier Information Sciences, Vol. 231, pp. 64-82, 2013. https://doi.org/10.1016/j.ins.2011.08.020
Bayer U., Comparetti P. M., Hlauschek C., Kruegel C., and Kirda E., "Scalable, behavior-based malware clustering", In NDSS 2009. https://www.ndss-symposium.org/ndss2009/scalablebehavior-based-malware-clustering/
Anderson B., Quist D., Neil J., Storlie C., and Lane T., "Graph-based malware detection using dynamic analysis", Journal in computer Virology, Vol. 7, 247-258, 2011. https://doi.org/10.1007/s11416-011-0152-x
Fujino A., Murakami J., and Mori T., "Discovering similar malware samples using api call topics", In IEEE CCNC, 2015. https://doi.org/10.1109/CCNC.2015.7157960
Ni S., Qian Q., and Zhang R., "Malware identification using visualization images and deep learning", Elsevier Computers & Security, 2018. https://doi.org/10.1016/j.cose.2018.04.005
Han KS, Lim JH, Kang B, Im EG, "Malware analysis using visualized images and entropy graphs", Int Journal of Information Security, Vol.14, pp. 1-14, 2015. https://doi.org/10.1007/s10207-014-0242-0
Ronen R., Radu M., Feuerstein C., Yom-Tov E., and Ahmadi M., "Microsoft Malware Classification Challenge", arXiv preprint arXiv:1802.10135, 2018. https://arxiv.org/abs/1802.10135
Gong L., Mueller M., Prafullchandra H., and Schemers R., "Going beyond the sandbox: An overview of the new security architecture in the Java development kit 1.2", In USENIX Symposium on Internet Technologies and Systems, 1997. https://www.usenix.org/conference/usits-97/going-beyond-sandbox-overview-new-security-architecture-java-development-kit-12
Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., and Rabinovich A., "Going deeper with convolutions", In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), 2015. https://doi.org/10.1109/CVPR.2015.7298594
Abadi M., Barham P., Chen J., Chen Z., Davis A., Dean J., and Kudlur M., "TensorFlow: A system for large-scale machine learning", in the Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation(OSDI), pp. 265-283, 2016. https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf
LeCun Y., Bottou L., Bengio Y., and Haffner P., "Gradient-Based Learning Applied to Document Recognition", in Proceeding of the IEEE 86.11, pp. 2278-2324, 1998. https://doi.org/10.1109/5.726791
"ImageNet Large Scale Visual Recognition Competition", http://www.image-net.org/challenges/LSVRC/
Arora S., Bhaskara A., Ge R., and Ma T., "Provable bounds for learning some deep representations", In International Conference on Machine Learning, pp. I-584-I-592, 2014. http://proceedings.mlr.press/v32/arora14.pdf

Journal of Internet Computing and Services (인터넷정보학회논문지)

Visualization of Malwares for Classification Through Deep Learning

딥러닝 기술을 활용한 멀웨어 분류를 위한 이미지화 기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)