DOI QR코드

DOI QR Code

Evaluations of AI-based malicious PowerShell detection with feature optimizations

  • Song, Jihyeon (ICT (Information Security Engineering), University of Science and Technology) ;
  • Kim, Jungtae (Cyber Security Research Division, Electronics and Telecommunications Research Institute) ;
  • Choi, Sunoh (Department of Software Engineering, Jeonbuk National University) ;
  • Kim, Jonghyun (Cyber Security Research Division, Electronics and Telecommunications Research Institute) ;
  • Kim, Ikkyun (Cyber Security Research Division, Electronics and Telecommunications Research Institute)
  • Received : 2020.05.18
  • Accepted : 2020.11.25
  • Published : 2021.06.01

Abstract

Cyberattacks are often difficult to identify with traditional signature-based detection, because attackers continually find ways to bypass the detection methods. Therefore, researchers have introduced artificial intelligence (AI) technology for cybersecurity analysis to detect malicious PowerShell scripts. In this paper, we propose a feature optimization technique for AI-based approaches to enhance the accuracy of malicious PowerShell script detection. We statically analyze the PowerShell script and preprocess it with a method based on the tokens and abstract syntax tree (AST) for feature selection. Here, tokens and AST represent the vocabulary and structure of the PowerShell script, respectively. Performance evaluations with optimized features yield detection rates of 98% in both machine learning (ML) and deep learning (DL) experiments. Among them, the ML model with the 3-gram of selected five tokens and the DL model with experiments based on the AST 3-gram deliver the best performance.

Keywords

Acknowledgement

This research was supported by the Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korean government (MSIT) (no. 2019-0-00026, ICT infrastructure protection against intelligent malware threats).

References

  1. A. Osipov, Trickbot trojan leveraging a new windows 10 uac bypass, Jan. 2020, available at https://blog.morphisec.com/trickbotuses-a-new-windows-10-uac-bypass.
  2. A. Katrenko, Malware sandbox evasion: Techniques, principles & solutions, Mar. 2020, available at https://www.apriorit.com/devblog/545-sandbox-evading-malware.
  3. Symantec, The increased use of powershell in attacks, 2016, available at https://www.symantec.com/content/dam/symantec/docs/security-center/white-papers/increased-use-of-powershell-in-attacks-16-en.pdf.
  4. Microsoft, What is powershell?, May 2020, available at https://docs.microsoft.com/en-us/powershell/scripting/overview?view=powershell-7.
  5. A. Mellen, Fileless malware 101: Understanding non-malware attacks, Sept. 2019, available at https://www.cybereason.com/blog/fileless-malware.
  6. M. Kim, Supervised learning-based ddos attacks detection: Tuning hyperparameters, ETRI J. 41 (2019), no. 5, 560-573. https://doi.org/10.4218/etrij.2019-0156
  7. I. Ko, D. Chambers, and E. Barrett, Unsupervised learning with hierarchical feature selection for DDos mitigation within the isp domain, ETRI J. 41 (2019), no. 5, 574-584. https://doi.org/10.4218/etrij.2019-0109
  8. D. Bohannon and L. Holmes, Revoke-obfuscation: Powershell obfuscation detection using science, 2017, available at https://www.blackhat.com/docs/us-17/thursday/us-17-Bohannon-Revoke-Obfuscation-PowerShell-Obfuscation-Detection-And%20Evasion-Using-Science.pdf.
  9. D. Bohannon, Invoke-obfuscation v1.8.2, 2018, available at https://github.com/danielbohannon/Invoke-Obfuscation.
  10. Trendmicro, Emotet uses coronavirus scare in latest campaign, targets japan, 2020, available at https://www.trendmicro.com/vinfo/us/security/news/cybercrime-and-digital-threats/emotet-usescoronavirus-scare-in-latest-campaign-targets-japan.
  11. IGLOO security, Monthly security report, 2020, available at http://www.igloosec.co.kr/pdf/igloosec_security_report_202002_en.pdf.
  12. J. H. Song et al., Implementation of a static powershell analysis based on the cnn-lstm model with token optimizations, in Proc. World Conf. Inform. Secur. Appl. (Jeju, Rep. of Korea), Aug. 2019, pp. 99-107.
  13. D. Hendler, S. Kels, and A. Rubin, Detecting malicious powershell commands using deep nerual networks, in Proc. Asia Conf. Comput. Commun. Secur. (Incheon, Rep. of Korea), June 2018, pp. 187-197.
  14. A. Rubin, S. Kels, and D. Hendler, Amsi-based detection of malicious powershell code using contextual embeddings, available at arXiv preprint CoRR, 2019 arXiv: 1905.09538v2.
  15. G. Rusak, A. Al-Dujaili, and U. M. O'Reilly, Ast-based deep learning for detecting malicious powershell, in Proc. Conf. Comput. Commun. Secur. (Toronto, Canada), Oct. 2018, pp. 2276-2278.
  16. Microsoft, Pstokentype enum, 2019, available at https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.pstokentype?view=pscore-6.2.0.
  17. S. Wheeler, Viewing object structure (get-member), 2017, available at https://docs.microsoft.com/en-us/powershell/scripting/samples/viewing-object-structure--get-member-?view=powershell-7.
  18. P. Singh, Powershell: Tokenization and abstract syntax tree, 2017, available at https://geekeefy.wordpress.com/2017/06/07/powershell-tokenization-and-abstract-syntax-tree.
  19. Microsoft, System.management.automation.language.namespace, 2019, available at https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.language?view=pscore-6.2.0.
  20. T. Yiu, Understanding random forest, 2019, available at https://towardsdatascience.com/understanding-random-forest-58381e0602d2.
  21. R. Gandhi, Support vector machine-introduction to machine learning algorithms, 2018, available at https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47.
  22. O. Harrison, Machine learning basics with the k-nearest neighbors algorithm, 2018, available at https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm6a6e71d01761.
  23. T. Y. Kim, Text input binary classification model recipe, 2017, available at https://tykimos.github.io/2017/08/17/Text_Input_Binary_Classification_Model_Recipe.
  24. Y. Kim, Convolutional neural networks for sentence classification, in Proc. Conf. Empir. Methods Nat. Language Process. (Doha, Qatar), Oct. 2014, pp. 1746-1751.
  25. A. Amidi and S. Amidi. Recurrent neural networks cheatsheet, 2019, available at https://stanford.edu/shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks.
  26. C. F. Wang, The vanishing gradient problem, Jan. 2019, available at https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484.
  27. C. Olah, Understanding lstm networks, 2015, available at https://colah.github.io/posts/2015-08-Understanding-LSTMs.
  28. J. Wang et al., Dimensional sentiment analysis using a regional cnn-lstm model, in Proc. 54th Annu. Meet. Assoc. Comput. Linguist. (Berlin, Germany), Aug. 2016, pp. 225-230.
  29. J. White, Pulling back the curtains on encodedcommand power-shell attacks, 2017, available at https://unit42.paloaltonetworks.com/unit42-pulling-back-the-curtains-on-encodedcommandpowershell-attacks.
  30. ESTsecurity, available at https://www.estsecurity.com.
  31. Virustotal, available at https://www.virustotal.com.
  32. Powershell corpus, available at https://aka.ms/PowerShellCorpus.
  33. P. Legadec, oletools-python tools to analyze ole and ms office files, 2018, available at https://www.decalage.info/python/oletools.
  34. I. Santos et al., Idea: Opcode-sequence-based malware detection, in Proc. Int. Symp. Secur. Eng. Softw. Syst. (Pisa, Italy), Feb. 2010, pp. 35-43.
  35. S. Narkhede, Understanding auc-roc curve, 2018, available at https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5.
  36. A. Vaswani et al., Attention is all you need, in Proc. Adv. Neural Inform. Process. Syst. (Long Beach, CA, USA), 2017, 5998-6008.
  37. L. Tong et al., Improving robustness of ML classifiers against realizable evasion attacks using conserved features, in Proc. 28th USENIX Secur. Symp. (USENIX Security 19), (Santa Clara, CA, USA), Aug. 2019, pp. 285-302.

Cited by

  1. CitiusSynapse: A Deep Learning Framework for Embedded Systems vol.11, pp.23, 2021, https://doi.org/10.3390/app112311570