DOI QR코드

DOI QR Code

Visualization of Malwares for Classification Through Deep Learning

딥러닝 기술을 활용한 멀웨어 분류를 위한 이미지화 기법

  • Kim, Hyeonggyeom (Dept. of Computer Science and Information Engineering, Korea National University of Transportation) ;
  • Han, Seokmin (Dept. of Computer Science and Information Engineering, Korea National University of Transportation) ;
  • Lee, Suchul (Dept. of Computer Science and Information Engineering, Korea National University of Transportation) ;
  • Lee, Jun-Rak (Dept. of Humanities and Social Sciences, Kangwon National University)
  • Received : 2018.07.17
  • Accepted : 2018.08.16
  • Published : 2018.10.31

Abstract

According to Symantec's Internet Security Threat Report(2018), Internet security threats such as Cryptojackings, Ransomwares, and Mobile malwares are rapidly increasing and diversifying. It means that detection of malwares requires not only the detection accuracy but also versatility. In the past, malware detection technology focused on qualitative performance due to the problems such as encryption and obfuscation. However, nowadays, considering the diversity of malware, versatility is required in detecting various malwares. Additionally the optimization is required in terms of computing power for detecting malware. In this paper, we present Stream Order(SO)-CNN and Incremental Coordinate(IC)-CNN, which are malware detection schemes using CNN(Convolutional Neural Network) that effectively detect intelligent and diversified malwares. The proposed methods visualize each malware binary file onto a fixed sized image. The visualized malware binaries are learned through GoogLeNet to form a deep learning model. Our model detects and classifies malwares. The proposed method reveals better performance than the conventional method.

Symantec의 인터넷 보안위협 보고서(2018)에 따르면 크립토재킹, 랜섬웨어, 모바일 등 인터넷 보안위협이 급증하고 있으며 다각화되고 있다고 한다. 이는 멀웨어(Malware) 탐지기술이 암호화, 난독화 등의 문제에 따른 질적 성능향상 뿐만 아니라 다양한 멀웨어의 탐지 등 범용성을 요구함을 의미한다. 멀웨어 탐지에 있어 범용성을 달성하기 위해서는 탐지알고리즘에 소모되는 컴퓨팅 파워, 탐지 알고리즘의 성능 등의 측면에서의 개선 및 최적화가 이루어져야 한다. 본고에서는 최근 지능화, 다각화 되는 멀웨어를 효과적으로 탐지하기 위하여 CNN(Convolutional Neural Network)을 활용한 멀웨어 탐지 기법인, stream order(SO)-CNN과 incremental coordinate(IC)-CNN을 제안한다. 제안기법은 멀웨어 바이너리 파일들을 이미지화 한다. 이미지화 된 멀웨어 바이너리는 GoogLeNet을 통해 학습되어 딥러닝 모델을 형성하고 악성코드를 탐지 및 분류한다. 제안기법은 기존 방법에 비해 우수한 성능을 보인다.

Keywords

References

  1. "Innovation, organisation, and sophistication-these are the tools of cyber attackers as they work harder and more efficiently to uncover new vulnerabilities", Symantec Internet Security Threat Report, 2018. https://resource.elq.symantec.com/LP=5840?cid=70138000000rm1eAAA
  2. Nataraj L, Karthikeyan S, Jacob G, Manjunath B. S., "Malware images: visualization and automatic classification", In proc. of the 8th ACM international symposium on visualization for cyber security 2011. http://doi.org/10.1145/2016904.2016908
  3. Ji H., and Im E., "Malware Classification Using Machine Learning and Binary Visualization", the Korea Computer Congress. KCC, pp.1084-1086, 2017. http://dx.doi.org/10.5626/KTCP.2018.24.4.198
  4. Schultz MG, Eskin E, Zadok F, Stolfo SJ, "Data mining methods for detection of new malicious executables", In IEEE symposium on security and privacy(S&P '01), 2001. https://doi.org/10.1109/SECPRI.2001.924286
  5. Cohen W. W., "Fast effective rule induction", In Proceedings of the Twelfth International Conference on Machine Learning, 1995. https://doi.org/10.1016/B978-1-55860-377-6.50023-2
  6. Kong D. and Yan G., "Discriminant malware distance learning on structural information for automated malware classification", In ACM SIGKDD 2013. http://dx.doi.org/10.1145/2487575.2488219
  7. Li Q. and Li X., "Android malware detection based on static analysis of characteristic tree", In international conference on cyber-enabled distributed computing and knowledge discovery (cyberc), 2015. https://doi.org/10.1109/CyberC.2015.88
  8. Santos I., Brezo F., Ugarte-Pedrero X., Bringas P. G., "Opcode sequences as representation of executables for data-mining-based unknown malware detection", Elsevier Information Sciences, Vol. 231, pp. 64-82, 2013. https://doi.org/10.1016/j.ins.2011.08.020
  9. Bayer U., Comparetti P. M., Hlauschek C., Kruegel C., and Kirda E., "Scalable, behavior-based malware clustering", In NDSS 2009. https://www.ndss-symposium.org/ndss2009/scalablebehavior-based-malware-clustering/
  10. Anderson B., Quist D., Neil J., Storlie C., and Lane T., "Graph-based malware detection using dynamic analysis", Journal in computer Virology, Vol. 7, 247-258, 2011. https://doi.org/10.1007/s11416-011-0152-x
  11. Fujino A., Murakami J., and Mori T., "Discovering similar malware samples using api call topics", In IEEE CCNC, 2015. https://doi.org/10.1109/CCNC.2015.7157960
  12. Ni S., Qian Q., and Zhang R., "Malware identification using visualization images and deep learning", Elsevier Computers & Security, 2018. https://doi.org/10.1016/j.cose.2018.04.005
  13. Han KS, Lim JH, Kang B, Im EG, "Malware analysis using visualized images and entropy graphs", Int Journal of Information Security, Vol.14, pp. 1-14, 2015. https://doi.org/10.1007/s10207-014-0242-0
  14. Ronen R., Radu M., Feuerstein C., Yom-Tov E., and Ahmadi M., "Microsoft Malware Classification Challenge", arXiv preprint arXiv:1802.10135, 2018. https://arxiv.org/abs/1802.10135
  15. Gong L., Mueller M., Prafullchandra H., and Schemers R., "Going beyond the sandbox: An overview of the new security architecture in the Java development kit 1.2", In USENIX Symposium on Internet Technologies and Systems, 1997. https://www.usenix.org/conference/usits-97/going-beyond-sandbox-overview-new-security-architecture-java-development-kit-12
  16. Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., and Rabinovich A., "Going deeper with convolutions", In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), 2015. https://doi.org/10.1109/CVPR.2015.7298594
  17. Abadi M., Barham P., Chen J., Chen Z., Davis A., Dean J., and Kudlur M., "TensorFlow: A system for large-scale machine learning", in the Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation(OSDI), pp. 265-283, 2016. https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf
  18. LeCun Y., Bottou L., Bengio Y., and Haffner P., "Gradient-Based Learning Applied to Document Recognition", in Proceeding of the IEEE 86.11, pp. 2278-2324, 1998. https://doi.org/10.1109/5.726791
  19. "ImageNet Large Scale Visual Recognition Competition", http://www.image-net.org/challenges/LSVRC/
  20. Arora S., Bhaskara A., Ge R., and Ma T., "Provable bounds for learning some deep representations", In International Conference on Machine Learning, pp. I-584-I-592, 2014. http://proceedings.mlr.press/v32/arora14.pdf