Browse > Article
http://dx.doi.org/10.13089/JKIISC.2022.32.5.945

Method of Similarity Hash-Based Malware Family Classification  

Kim, Yun-jeong (Igloo Corporation)
Kim, Moon-sun (Softverse)
Lee, Man-hee (Hannam University)
Abstract
Billions of malicious codes are detected every year, of which only 0.01% are new types of malware. In this situation, an effective malware type classification tool is needed, but previous studies have limitations in quickly analyzing a large amount of malicious code because it requires a complex and massive amount of data pre-processing. To solve this problem, this paper proposes a method to classify the types of malicious code based on the similarity hash without complex data preprocessing. This approach trains the XGBoost model based on the similarity hash information of the malware. To evaluate this approach, we used the BIG-15 dataset, which is widely used in the field of malware classification. As a result, the malicious code was classified with an accuracy of 98.9% also, identified 3,432 benign files with 100% accuracy. This result is superior to most recent studies using complex preprocessing and deep learning models. Therefore, it is expected that more efficient malware classification is possible using the proposed approach.
Keywords
Malware; Malware classification; Machine learning; Similarity hash; TLSH;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Yunan Zhang, Chenghao Rong,Qingjia Huang, Yang Wu, ZemingYang and Jianguo Jiang, "BasedonMulti-features and Clustering Ensemble Method for AutomaticMalware Categorization," 2017 IEEETrustcom/Big DataSE/ICESS, 2017,pp. 73-82, Sep. 2017.
2 Kaggle, Microsoft Malware Classification Challenge (BIG 2015), "https://www.kaggle.com/c/malware-classification," 2015.
3 Woojin Joe and Hyongshik Kim, "Malware Family Detection and Classification Method Using API Call Frequency," Korea Institute of Information Security and Cryptology (KIISC), 31(4), pp. 605-616, Aug. 2021.
4 Mehadi Hassen, Marco M. Carvalho and Philip K. Chan, "Malware classification using static analysis based features," 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1-7, Feb. 2017.
5 Lin Li, Ying Ding, Bo Li, MengqingQiao and Biao Ye, "Malwareclassification based on double by tefeature encoding," Alexandria Engineering Journal, vol. 61, no. 1,pp. 91-99, Jan. 2022.   DOI
6 Sudhakar and Sushil Kumar,"MCFT-CNN: Malware classificationwith fine-tune convolution neural networks using traditional and transfer learning in Internet of Things," Future Generation Computer Systems, vol. 125, pp 334-351, Dec.2021.   DOI
7 Jake Drew, Tyler Moore and Michael Hahsler, "Polymorphic malwaredetection using sequence classificationmethods," 2016 IEEE Security and Privacy Workshops (SPW), pp. 81-87, May. 2016.
8 Diffeo diffeo, Py-nilsimsa, "https://github.com/diffeo/py-nilsimsa," Apr. 2016.
9 Oliver, Jonathan, Chun Cheng and Yanggui Chen, "TLSH--a locality sensitive hash," 2013 Fourth Cybercrime and Trustworthy Computing Workshop. IEEE, pp. 7-13, Nov. 2013.
10 Jazi Hossein Hadian and Ali Akbar Ghorbani, "Dynamic graph-based malware classifier," 2016 14th Annual Conference on Privacy, Security and Trust (PST), pp. 112-120, Dec. 2016.
11 Barath Narayanan Narayanan and Venkata Salini Priyamvada Davuluru, "Ensemble malware classification system using deep neural networks," Electronics, vol. 9, no. 5, 721, Apr. 2020.
12 Mamoona Khan, Duaa Baig, UsmanShahid Khan and Ahmad Karim,"Malware Classification Frameworkusing Convolutional Neural Network," 2020 International Conferenceon Cyber Warfare and Security (ICCWS),pp. 1-7, Oct. 2022.
13 Dongwoo Goh and Huykang Kim, "A Study on Malware Clustering Technique Using API Call Sequenceand Locality Sensitive Hashing," Korea Institute of Information Securityand Cryptology (KIISC), 27(1), pp. 91-101, Feb. 2017.   DOI
14 Changwook Park, Hyunji Chung, Kwangseok Seo and Sangjin Lee,"Research on the ClassificationModel of Similarity Malware using Fuzzy Hash," Journal of The Korea Instituteof Information Security & Cryptology(KIISC), 22(6), pp. 1325-1336, Dec.2012.
15 Zynamics.com, "BinDiff," "https://www.zynamics.com," 2021.
16 Sonicwall, "SONICWALL: 'THEYEAROF RANSOMWARE' CONTINUESWITH UNPRECEDENTED LATE-SUMMER SURGE," "https://www.sonicwall.com/news/sonicwall-the-year-of-ransomware-continues-with-unprecedented-late-summer-surge/," Oct. 2021.
17 Pekta? Abdurrahman and Tankut Acarman, "Malware classification based on API calls and behaviour analysis," IET Information Security, vol. 12, no. 2, pp. 107-117, Sep. 2018.   DOI
18 Kisa, Information Security R&D Data Challenge 2019, "http://datachallenge.kr/challenge19/rd-datachallenge/malware/introduction/," Nov. 2019.
19 Jesse Kornblum, "Identifying almostidentical files using context triggeredpiecewise hashing," Digital investigation 3, pp. 91-97, Jul. 2006.   DOI