Browse > Article
http://dx.doi.org/10.13089/JKIISC.2019.29.3.549

Unified Labeling and Fine-Grained Verification for Improving Ground-Truth of Malware Analysis  

Oh, Sang-Jin (Information Security Lab., Graduation School of Information, Yonsei University)
Park, Leo-Hyun (Information Security Lab., Graduation School of Information, Yonsei University)
Kwon, Tae-Kyoung (Information Security Lab., Graduation School of Information, Yonsei University)
Abstract
According to a recent report by anti-virus vendors, the number of new and modified malware increased exponentially. Therefore, malware analysis research using machine learning has been actively researched in order to replace passive analysis method which has low analysis speed. However, when using supervised learning based machine learning, many studies use low-reliability malware family name provided by the antivirus vendor as the label. In order to solve the problem of low-reliability of malware label, this paper introduces a new labeling technique, "Unified Labeling", and further verifies the malicious behavior similarity through the feature analysis of the fine-grained method. To verify this study, various clustering algorithms were used and compared with existing labeling techniques.
Keywords
Malware; Labeling; Machine Learning; Clustering;
Citations & Related Records
연도 인용수 순위
  • Reference
1 G DATA SECURITY BLOG, https://www.gdatasoftware.com/blog, Dec, 2018
2 S ebastian, Marcos, et al. "Avclass: A tool for massive malware labeling." International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, pp. 230-253, Sep. 2016.
3 Kantchelian, Alex, et al. "Better malware ground truth: Techniques for weighting anti-virus vendor labels." Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security. pp. 45-56, Oct. 2015.
4 Hurier, Mederic, et al. "On the lack of consensus in anti-virus decisions: Metrics and insights on building ground truths of android malware." International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment(DIMVA). pp.142-162, Jul. 2016.
5 Li, Jin, et al. "Significant Permission Identification for Machine Learning Based Android Malware Detection." IEEE Transactions on Industrial Informatics, 14(7), pp.3216-3225, Jan. 2018.   DOI
6 M Ahmadi, D. Ulyanov, S. Semenov, M. Trofimov, and G. Giacinto. "Novel feature extraction, selection and fusion for effective malware family classification," In Proc. Data and Application Security and Privacy (CODASPY), pp. 183-194, Mar. 2016.
7 Oh Sangjin, Park Laehyun, Park Jun-hyung and Kwon Taekyoung, "A Study of Labeling for Ground-Truth of Malware Family Names." Conference on Information Security and Cryptography-Winter, pp 66-69, Dec. 2018.
8 Symantec, "Internet Security Threat Report (ISTR)", https://www.symantec.com/security-center/threat-report, Dec, 2018
9 Kaspersky, "Overall Statistics For 2017", https://securelist.com/ksb-overall-statistics-2017/83453, Dec. 2018