[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4218/etrij.2020-0215

Evaluations of AI-based malicious PowerShell detection with feature optimizations

Song, Jihyeon (ICT (Information Security Engineering), University of Science and Technology)
Kim, Jungtae (Cyber Security Research Division, Electronics and Telecommunications Research Institute)
Choi, Sunoh (Department of Software Engineering, Jeonbuk National University)
Kim, Jonghyun (Cyber Security Research Division, Electronics and Telecommunications Research Institute)
Kim, Ikkyun (Cyber Security Research Division, Electronics and Telecommunications Research Institute)

Publication Information

ETRI Journal / v.43, no.3, 2021 , pp. 549-560 More about this Journal

Abstract

Cyberattacks are often difficult to identify with traditional signature-based detection, because attackers continually find ways to bypass the detection methods. Therefore, researchers have introduced artificial intelligence (AI) technology for cybersecurity analysis to detect malicious PowerShell scripts. In this paper, we propose a feature optimization technique for AI-based approaches to enhance the accuracy of malicious PowerShell script detection. We statically analyze the PowerShell script and preprocess it with a method based on the tokens and abstract syntax tree (AST) for feature selection. Here, tokens and AST represent the vocabulary and structure of the PowerShell script, respectively. Performance evaluations with optimized features yield detection rates of 98% in both machine learning (ML) and deep learning (DL) experiments. Among them, the ML model with the 3-gram of selected five tokens and the DL model with experiments based on the AST 3-gram deliver the best performance.

Keywords

Deep learning; feature optimization; fileless malware; machine learning; PowerShell script;

Citations & Related Records

Reference

1	C. Olah, Understanding lstm networks, 2015, available at https://colah.github.io/posts/2015-08-Understanding-LSTMs.
2	A. Mellen, Fileless malware 101: Understanding non-malware attacks, Sept. 2019, available at https://www.cybereason.com/blog/fileless-malware.
3	A. Vaswani et al., Attention is all you need, in Proc. Adv. Neural Inform. Process. Syst. (Long Beach, CA, USA), 2017, 5998-6008.
4	ESTsecurity, available at https://www.estsecurity.com.
5	S. Wheeler, Viewing object structure (get-member), 2017, available at https://docs.microsoft.com/en-us/powershell/scripting/samples/viewing-object-structure--get-member-?view=powershell-7.
6	T. Yiu, Understanding random forest, 2019, available at https://towardsdatascience.com/understanding-random-forest-58381e0602d2.
7	T. Y. Kim, Text input binary classification model recipe, 2017, available at https://tykimos.github.io/2017/08/17/Text_Input_Binary_Classification_Model_Recipe.
8	C. F. Wang, The vanishing gradient problem, Jan. 2019, available at https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484.
9	J. Wang et al., Dimensional sentiment analysis using a regional cnn-lstm model, in Proc. 54th Annu. Meet. Assoc. Comput. Linguist. (Berlin, Germany), Aug. 2016, pp. 225-230.
10	Virustotal, available at https://www.virustotal.com.
11	Powershell corpus, available at https://aka.ms/PowerShellCorpus.
12	P. Legadec, oletools-python tools to analyze ole and ms office files, 2018, available at https://www.decalage.info/python/oletools.
13	A. Osipov, Trickbot trojan leveraging a new windows 10 uac bypass, Jan. 2020, available at https://blog.morphisec.com/trickbotuses-a-new-windows-10-uac-bypass.
14	I. Santos et al., Idea: Opcode-sequence-based malware detection, in Proc. Int. Symp. Secur. Eng. Softw. Syst. (Pisa, Italy), Feb. 2010, pp. 35-43.
15	S. Narkhede, Understanding auc-roc curve, 2018, available at https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5.
16	L. Tong et al., Improving robustness of ML classifiers against realizable evasion attacks using conserved features, in Proc. 28th USENIX Secur. Symp. (USENIX Security 19), (Santa Clara, CA, USA), Aug. 2019, pp. 285-302.
17	M. Kim, Supervised learning-based ddos attacks detection: Tuning hyperparameters, ETRI J. 41 (2019), no. 5, 560-573. DOI
18	A. Katrenko, Malware sandbox evasion: Techniques, principles & solutions, Mar. 2020, available at https://www.apriorit.com/devblog/545-sandbox-evading-malware.
19	Symantec, The increased use of powershell in attacks, 2016, available at https://www.symantec.com/content/dam/symantec/docs/security-center/white-papers/increased-use-of-powershell-in-attacks-16-en.pdf.
20	Microsoft, What is powershell?, May 2020, available at https://docs.microsoft.com/en-us/powershell/scripting/overview?view=powershell-7.
21	I. Ko, D. Chambers, and E. Barrett, Unsupervised learning with hierarchical feature selection for DDos mitigation within the isp domain, ETRI J. 41 (2019), no. 5, 574-584. DOI
22	D. Bohannon and L. Holmes, Revoke-obfuscation: Powershell obfuscation detection using science, 2017, available at https://www.blackhat.com/docs/us-17/thursday/us-17-Bohannon-Revoke-Obfuscation-PowerShell-Obfuscation-Detection-And%20Evasion-Using-Science.pdf.
23	IGLOO security, Monthly security report, 2020, available at http://www.igloosec.co.kr/pdf/igloosec_security_report_202002_en.pdf.
24	G. Rusak, A. Al-Dujaili, and U. M. O'Reilly, Ast-based deep learning for detecting malicious powershell, in Proc. Conf. Comput. Commun. Secur. (Toronto, Canada), Oct. 2018, pp. 2276-2278.
25	J. White, Pulling back the curtains on encodedcommand power-shell attacks, 2017, available at https://unit42.paloaltonetworks.com/unit42-pulling-back-the-curtains-on-encodedcommandpowershell-attacks.
26	D. Bohannon, Invoke-obfuscation v1.8.2, 2018, available at https://github.com/danielbohannon/Invoke-Obfuscation.
27	Trendmicro, Emotet uses coronavirus scare in latest campaign, targets japan, 2020, available at https://www.trendmicro.com/vinfo/us/security/news/cybercrime-and-digital-threats/emotet-usescoronavirus-scare-in-latest-campaign-targets-japan.
28	P. Singh, Powershell: Tokenization and abstract syntax tree, 2017, available at https://geekeefy.wordpress.com/2017/06/07/powershell-tokenization-and-abstract-syntax-tree.
29	J. H. Song et al., Implementation of a static powershell analysis based on the cnn-lstm model with token optimizations, in Proc. World Conf. Inform. Secur. Appl. (Jeju, Rep. of Korea), Aug. 2019, pp. 99-107.
30	D. Hendler, S. Kels, and A. Rubin, Detecting malicious powershell commands using deep nerual networks, in Proc. Asia Conf. Comput. Commun. Secur. (Incheon, Rep. of Korea), June 2018, pp. 187-197.
31	Microsoft, Pstokentype enum, 2019, available at https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.pstokentype?view=pscore-6.2.0.
32	Microsoft, System.management.automation.language.namespace, 2019, available at https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.language?view=pscore-6.2.0.
33	R. Gandhi, Support vector machine-introduction to machine learning algorithms, 2018, available at https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47.
34	O. Harrison, Machine learning basics with the k-nearest neighbors algorithm, 2018, available at https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm6a6e71d01761.
35	A. Amidi and S. Amidi. Recurrent neural networks cheatsheet, 2019, available at https://stanford.edu/shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks.
36	Y. Kim, Convolutional neural networks for sentence classification, in Proc. Conf. Empir. Methods Nat. Language Process. (Doha, Qatar), Oct. 2014, pp. 1746-1751.
37	A. Rubin, S. Kels, and D. Hendler, Amsi-based detection of malicious powershell code using contextual embeddings, available at arXiv preprint CoRR, 2019 arXiv: 1905.09538v2.