Browse > Article
http://dx.doi.org/10.13089/JKIISC.2019.29.3.531

API Feature Based Ensemble Model for Malware Family Classification  

Lee, Hyunjong (Dankook University)
Euh, Seongyul (Dankook University)
Hwang, Doosung (Dankook University)
Abstract
This paper proposes the training features for malware family analysis and analyzes the multi-classification performance of ensemble models. We construct training data by extracting API and DLL information from malware executables and use Random Forest and XGBoost algorithms which are based on decision tree. API, API-DLL, and DLL-CM features for malware detection and family classification are proposed by analyzing frequently used API and DLL information from malware and converting high-dimensional features to low-dimensional features. The proposed feature selection method provides the advantages of data dimension reduction and fast learning. In performance comparison, the malware detection rate is 93.0% for Random Forest, the accuracy of malware family dataset is 92.0% for XGBoost, and the false positive rate of malware family dataset including benign is about 3.5% for Random Forest and XGBoost.
Keywords
Malware Detection; Malware Classification; Feature Selection; Tree-based Ensemble;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. Patcha and J.M Park, "An overview of anomaly detection techniques : Existing solutions and latest technological trends," Computer networks, vol. 51, no. 12, pp. 3448-3470, Aug. 2007.   DOI
2 K. Rieck, T. Holz, C. Willems, P. Dussel and P. Laskov, "Learning and classification of malware behavior," Proceeding of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 108-125, 2008.
3 B. Sun, Q. Li, Y. Quo, Q. Wen, X. Lin and W. Liu, "Malware family classification method based on static feature extraction," Proceeding of the 3rd IEEE International Conference on Computer and Communications(ICCC), pp. 507-513, Dec. 2017.
4 M. Ahmadi, D. Ulyanov, S. Semenov, M. Trofimov and G. Giacinto, "Novel feature extraction, selection and fusion for effective malware family classification," Proceedings of the 6th ACM conference on data and application security and privacy, pp. 183-194, Mar. 2016.
5 R. Veeramani, and N. Rai. "Windows API based malware detection and framework analysis," Proceeding of the International conference on networks and cyber security, Vol. 25. Jan. 2012.
6 M. Sikorski and A. Honig, Practical Malware Analysis: the hands-on guide to dissecting malicious software, No Starch Press, Feb. 2012.
7 Avira, "avira antivirus for windows" https://www.avira.com/en/free-antivirus-windows, Jan. 2019.
8 C. Willems, T. Holz, F. Freiling, "Toward Automated Dynamic Malware Analysis Using CWSandbox," IEEE Security and Privacy, vol. 5, no. 2, pp. 32-39, Apr. 2007.
9 G. Hunt, D. Brubacker, "Detours: Binary interception of Win32 functions," Proceedings of the 3rd USENIX Windows NT Symposium, pp. 135-143, Jul. 1999.
10 VirusShare, "virus share" https://virusshare.com/, Jan. 2019.
11 Symantec, "Symantec internet security threat report," ISTR-23-2018, Symantec, 2018.
12 P. Vinod, R. Jaipur, V. Laxmi and M. Gaur, "Survey on malware detection methods," Proceedings of the 3rd Hacker's workshop on computer and internet security, pp. 74-79, Mar. 2009.
13 Kaspersky, "kaspersky lab" http://www.kaspersky.com/, Jan. 2019.
14 L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5-32, Oct. 2001.   DOI
15 R. Ronen, M. Radu, C. Feuerstein, E. Yom-Tov and M. Ahmadi, "Microsoft Malware Classification Challenge," arXiv preprint arXiv:1802.10135, Feb. 2018.
16 T. Chen, and C. Guestrin, "XGBoost: A scalable tree boosting system," Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785-794, Aug. 2016.
17 C. Eagle, The IDA pro book, 2md Ed., No Starch Press, Jul. 2011.
18 Radare2, "radare2" https://rada.re/r/, Jan. 2019.
19 Windows API Index, "Windows API Index" https://docs.microsoft.com/en-us/windows/desktop/apiindex/windows-api-list, Jan. 2019.
20 Malwares.com, "malware.com" https://malwares.com, Jan. 2019.
21 N. Idika, and A.P. Mathur, "A survey of malware detection techniques," Purdue University, Feb. 2007.