[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.13089/JKIISC.2022.32.2.181

A Study on Machine Learning Based Anti-Analysis Technique Detection Using N-gram Opcode

Kim, Hee Yeon (Korea University)
Lee, Dong Hoon (Korea University)

Publication Information

Journal of the Korea Institute of Information Security & Cryptology / v.32, no.2, 2022 , pp. 181-192 More about this Journal

Abstract

The emergence of new malware is incapacitating existing signature-based malware detection techniques., and applying various anti-analysis techniques makes it difficult to analyze. Recent studies related to signature-based malware detection have limitations in that malware creators can easily bypass them. Therefore, in this study, we try to build a machine learning model that can detect and classify the anti-analysis techniques of packers applied to malware, not using the characteristics of the malware itself. In this study, the n-gram opcodes are extracted from the malicious binary to which various anti-analysis techniques of the commercial packers are applied, and the features are extracted by using TF-IDF, and through this, each anti-analysis technique is detected and classified. In this study, real-world malware samples packed using The mida and VMProtect with multiple anti-analysis techniques were trained and tested with 6 machine learning models, and it constructed the optimal model showing 81.25% accuracy for The mida and 95.65% accuracy for VMProtect.

Keywords

Anti-analysis; N-gram; Malware Detection; Classification; Machine Learning;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	Malware Statistics & Trends Report \| AV-TEST, "Total malware", https://www.av-test.org/en/statistics/malware/, 4 Oct. 2021.
2	Brand M, Valli C, Woodward A, "Malware forensics: Discovery of the intent of deception," Journal of Digital Forensics, Security and Law, vol. 5, no. 4, pp. 1-13, 2010.
3	Barbosa, Gabriel Negreira, and Rodrigo Rubira Branco, "Prevalent characteristics in modern malware," Black Hat USA Conference, pp. 1-72, Aug. 2014.
4	Themida, "Themida Downloadand Usage", https://www.oreans.com/Themida.php, 4 Oct. 2021.
5	MalwareBazaar, "Malware Download Site", https://bazaar.abuse.ch/, 4 Oct. 2021.
6	Code Virtualizer, "Code Virtualizer Download", https://www.oreans.com/CodeVirtualizer.php, 4 Oct. 2021.
7	Woo-Jin Joe and Hyong-Shik Kim, "Malware Family Detection and Classification Method Using API Call Frequency," Journal of The Korea Institute of Information Security & Cryptology, 31(4), pp. 605-616, Aug. 2021. DOI
8	Branco, Rodrigo Rubira, Gabriel Negreira Barbosa, and Pedro Drimel Neto, "Scientific but not academical overview of malware anti-debugging, anti-disassembly and anti-vm technologies," Black Hat USA Conference, pp. 1-27, Aug. 2012.
9	Lee, Young Bi, Jae Hyuk Suk, and Dong Hoon Lee, "Bypassing Anti-Analysis of Commercial Protector Methods Using DBI Tools," IEEE Access, vol. 9, pp. 7655-7673, Jan. 2021. DOI
10	Hansen, Steven Strandlund, et al, "An approach for detection and family classification of malware based on behavioral analysis," International conference on computing, networking and communications (ICNC), IEEE, pp. 1-5, Feb. 2016.
11	Aghakhani, Hojjat, et al, "When Malware is Packin' Heat; Limits of Machine Learning Classifiers Based on Static Analysis Features," Network and Distributed Systems Security (NDSS) Symposium, pp. 1-20, Jan. 2020.
12	Cuckoo Sandbox, "Cuckoo Sandbox Download", https://cuckoosandbox.org/, 4 Oct. 2021.
13	Kwon, Iltaek, and Eul Gyu Im, "Extracting the representative API call patterns of malware families using recurrent neural network," Proceedings of the International Conference on Research in Adaptive and Convergent Systems, pp. 202-207, Sep. 2017.
14	Chen, Ping, et al, "Advanced or not? A comparative study of the use of anti-debugging and anti-VM techniques in generic and targeted malware," IFIP International Conference on ICT Systems Security and Privacy Protection, vol. 471, pp. 323-336, May. 2016.
15	Oyama, Yoshihiro, "Trends of anti-analysis operations of malwares observed in API call logs," Journal of Computer Virology and Hacking Techniques, vol. 14, no. 1, pp. 69-85, Feb. 2018. DOI
16	Virus Total, "Virus Total Online", https://www.virustotal.com/gui/home/upload, 4 Oct. 2021.
17	Roundy, Kevin A., and Barton P. Miller, "Binary-code obfuscations in prevalent packer tools," ACM Computing Surveys (CSUR), vol. 46, no. 1, pp. 1-32, Oct. 2013.
18	Raff, Edward, et al, "An investigation of byte n-gram features for malware classification," Journal of Computer Virology and Hacking Techniques, vol. 14. no. 1, pp. 1-20, Feb. 2018. DOI
19	Zhang, Hanqi, et al, "Classification of ransomware families with machine learning based on N-gram of opcodes," Future Generation Computer Systems, vol. 90, pp. 211-221, Jan. 2019. DOI
20	Park, Leo Hyun, et al, "Birds of a Feature: Intrafamily Clustering for Version Identification of Packed Malware," IEEE Systems Journal vol. 14. no. 3, pp. 4545-4556, Jan. 2020. DOI
21	Ali, Muhammad, et al, "MALGRA: Machine Learning and N-Gram Malware Feature Extraction and Detection System," Electronics, vol. 9, no. 11, pp. 1-21, Oct. 2020.
22	Moskovitch, Robert, et al, "Unknown malcode detection using opcode representation," European conference on intelligence and security informatics, vol. 5376, pp.204-215, Dec. 2008.
23	Pin 3.7 User Guide, "Pin Documentation", https://software.intel.com/sites/landingpage/pintool/docs/97619/Pin/html/, 4 Oct. 2021.
24	Cesare, Silvio, Yang Xiang, and Wanlei Zhou, "Malwise-an effective and efficient classification system for packed and polymorphic malware," IEEE Transactions on Computers, vol. 62, no. 6, pp. 1193-1206, Jun. 2013. DOI
25	Pechaz, Bassir, Majid Vafaie Jahan, and Mehrdad Jalali, "Malware detection using hidden Markov model based on Markov blanket feature selection method," International Congress on Technology, Communication and Knowledge (ICTCK), pp. 558-563, Nov. 2015.

KSCI

A Study on Machine Learning Based Anti-Analysis Technique Detection Using N-gram Opcode N-gram Opcode를 활용한 머신러닝 기반의 분석 방지 보호 기법 탐지 방안 연구

A Study on Machine Learning Based Anti-Analysis Technique Detection Using N-gram Opcode