Browse > Article
http://dx.doi.org/10.13089/JKIISC.2022.32.2.181

A Study on Machine Learning Based Anti-Analysis Technique Detection Using N-gram Opcode  

Kim, Hee Yeon (Korea University)
Lee, Dong Hoon (Korea University)
Abstract
The emergence of new malware is incapacitating existing signature-based malware detection techniques., and applying various anti-analysis techniques makes it difficult to analyze. Recent studies related to signature-based malware detection have limitations in that malware creators can easily bypass them. Therefore, in this study, we try to build a machine learning model that can detect and classify the anti-analysis techniques of packers applied to malware, not using the characteristics of the malware itself. In this study, the n-gram opcodes are extracted from the malicious binary to which various anti-analysis techniques of the commercial packers are applied, and the features are extracted by using TF-IDF, and through this, each anti-analysis technique is detected and classified. In this study, real-world malware samples packed using The mida and VMProtect with multiple anti-analysis techniques were trained and tested with 6 machine learning models, and it constructed the optimal model showing 81.25% accuracy for The mida and 95.65% accuracy for VMProtect.
Keywords
Anti-analysis; N-gram; Malware Detection; Classification; Machine Learning;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Malware Statistics & Trends Report | AV-TEST, "Total malware", https://www.av-test.org/en/statistics/malware/, 4 Oct. 2021.
2 Brand M, Valli C, Woodward A, "Malware forensics: Discovery of the intent of deception," Journal of Digital Forensics, Security and Law, vol. 5, no. 4, pp. 1-13, 2010.
3 Barbosa, Gabriel Negreira, and Rodrigo Rubira Branco, "Prevalent characteristics in modern malware," Black Hat USA Conference, pp. 1-72, Aug. 2014.
4 Themida, "Themida Downloadand Usage", https://www.oreans.com/Themida.php, 4 Oct. 2021.
5 MalwareBazaar, "Malware Download Site", https://bazaar.abuse.ch/, 4 Oct. 2021.
6 Code Virtualizer, "Code Virtualizer Download", https://www.oreans.com/CodeVirtualizer.php, 4 Oct. 2021.
7 Woo-Jin Joe and Hyong-Shik Kim, "Malware Family Detection and Classification Method Using API Call Frequency," Journal of The Korea Institute of Information Security & Cryptology, 31(4), pp. 605-616, Aug. 2021.   DOI
8 Branco, Rodrigo Rubira, Gabriel Negreira Barbosa, and Pedro Drimel Neto, "Scientific but not academical overview of malware anti-debugging, anti-disassembly and anti-vm technologies," Black Hat USA Conference, pp. 1-27, Aug. 2012.
9 Lee, Young Bi, Jae Hyuk Suk, and Dong Hoon Lee, "Bypassing Anti-Analysis of Commercial Protector Methods Using DBI Tools," IEEE Access, vol. 9, pp. 7655-7673, Jan. 2021.   DOI
10 Hansen, Steven Strandlund, et al, "An approach for detection and family classification of malware based on behavioral analysis," International conference on computing, networking and communications (ICNC), IEEE, pp. 1-5, Feb. 2016.
11 Aghakhani, Hojjat, et al, "When Malware is Packin' Heat; Limits of Machine Learning Classifiers Based on Static Analysis Features," Network and Distributed Systems Security (NDSS) Symposium, pp. 1-20, Jan. 2020.
12 Cuckoo Sandbox, "Cuckoo Sandbox Download", https://cuckoosandbox.org/, 4 Oct. 2021.
13 Kwon, Iltaek, and Eul Gyu Im, "Extracting the representative API call patterns of malware families using recurrent neural network," Proceedings of the International Conference on Research in Adaptive and Convergent Systems, pp. 202-207, Sep. 2017.
14 Chen, Ping, et al, "Advanced or not? A comparative study of the use of anti-debugging and anti-VM techniques in generic and targeted malware," IFIP International Conference on ICT Systems Security and Privacy Protection, vol. 471, pp. 323-336, May. 2016.
15 Oyama, Yoshihiro, "Trends of anti-analysis operations of malwares observed in API call logs," Journal of Computer Virology and Hacking Techniques, vol. 14, no. 1, pp. 69-85, Feb. 2018.   DOI
16 Virus Total, "Virus Total Online", https://www.virustotal.com/gui/home/upload, 4 Oct. 2021.
17 Roundy, Kevin A., and Barton P. Miller, "Binary-code obfuscations in prevalent packer tools," ACM Computing Surveys (CSUR), vol. 46, no. 1, pp. 1-32, Oct. 2013.
18 Raff, Edward, et al, "An investigation of byte n-gram features for malware classification," Journal of Computer Virology and Hacking Techniques, vol. 14. no. 1, pp. 1-20, Feb. 2018.   DOI
19 Zhang, Hanqi, et al, "Classification of ransomware families with machine learning based on N-gram of opcodes," Future Generation Computer Systems, vol. 90, pp. 211-221, Jan. 2019.   DOI
20 Park, Leo Hyun, et al, "Birds of a Feature: Intrafamily Clustering for Version Identification of Packed Malware," IEEE Systems Journal vol. 14. no. 3, pp. 4545-4556, Jan. 2020.   DOI
21 Ali, Muhammad, et al, "MALGRA: Machine Learning and N-Gram Malware Feature Extraction and Detection System," Electronics, vol. 9, no. 11, pp. 1-21, Oct. 2020.
22 Moskovitch, Robert, et al, "Unknown malcode detection using opcode representation," European conference on intelligence and security informatics, vol. 5376, pp.204-215, Dec. 2008.
23 Pin 3.7 User Guide, "Pin Documentation", https://software.intel.com/sites/landingpage/pintool/docs/97619/Pin/html/, 4 Oct. 2021.
24 Cesare, Silvio, Yang Xiang, and Wanlei Zhou, "Malwise-an effective and efficient classification system for packed and polymorphic malware," IEEE Transactions on Computers, vol. 62, no. 6, pp. 1193-1206, Jun. 2013.   DOI
25 Pechaz, Bassir, Majid Vafaie Jahan, and Mehrdad Jalali, "Malware detection using hidden Markov model based on Markov blanket feature selection method," International Congress on Technology, Communication and Knowledge (ICTCK), pp. 558-563, Nov. 2015.