[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.13089/JKIISC.2021.31.4.559

Analysis of Malware Group Classification with eXplainable Artificial Intelligence

Kim, Do-yeon (Hoseo University)
Jeong, Ah-yeon (Hoseo University)
Lee, Tae-jin (Hoseo University)

Publication Information

Journal of the Korea Institute of Information Security & Cryptology / v.31, no.4, 2021 , pp. 559-571 More about this Journal

Abstract

Along with the increase prevalence of computers, the number of malware distributions by attackers to ordinary users has also increased. Research to detect malware continues to this day, and in recent years, research on malware detection and analysis using AI is focused. However, the AI algorithm has a disadvantage that it cannot explain why it detects and classifies malware. XAI techniques have emerged to overcome these limitations of AI and make it practical. With XAI, it is possible to provide a basis for judgment on the final outcome of the AI. In this paper, we conducted malware group classification using XGBoost and Random Forest, and interpreted the results through SHAP. Both classification models showed a high classification accuracy of about 99%, and when comparing the top 20 API features derived through XAI with the main APIs of malware, it was possible to interpret and understand more than a certain level. In the future, based on this, a direct AI reliability improvement study will be conducted.

Keywords

Malware; XGBoost; Random Forest; XAI; SHAP;

Citations & Related Records

Reference

1	Oshiro, T.M., Perez, P.S., and Baranauskas, J.A, "How many trees in a random forest?," MLDM 2012: Machine Learning and Data Mining in Pattern Recognition, Vol. 7376, pp. 154-168, Jul. 2012.
2	Amir Bahador Parsa, Ali Movahedi, Homa Taghipour, Sybil Derrible, Abolfazl (Kouros) and Mohammadian, "Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis," Accident Analysis & Prevention, Vol. 136, Mar. 2020.
3	Gupta, S., Sharma, H. and Kaur, S, "Malware characterization using windows API call sequences," International Conference on Security, Privacy, and Applied Cryptography Engineering, Springer, Cham, pp. 271-280, Dec. 2016.
4	Maonan Wang, Kangfeng Zheng, Yanqing Yang and Xiujuan Wang, "An explainable machine learning framework for intrusion detection systems," IEEE Vol. 8, pp. 73127-73141, Apr. 2020.
5	Alvin E. Roth, "A value for n-person games," Princeton University Press, The Pitt Building, Trumpington Street, Cambridge CB2 IRP 32 East 57th street, New York, NY 10022, USA, 2016.
6	Lundberg, Scott, and Su-In Lee, "A unified approach to interpreting model predictions," arXiv preprint arXiv:1705.07874, Nov. 2017.
7	Staniak, Mateusz, and Przemyslaw Biecek, "Explanations of model predictions with live and breakDown packages," arXiv preprint arXiv:1804.01955, Vol. 10, No. 2, pp. 395-405, Apr. 2018. DOI
8	virustotal, "VirusTotal." https://www.virustotal.com/
9	Alejandro Barredo Arrieta, Natalia Diaz-Rodriguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador Garcia, Sergio Gil-Lopez, Daniel Molina, Richard Benjamins, Raja Chatila and Francisco Herrera, "Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI," Information Fusion vol. 58, pp. 82-115, Jun. 2020. DOI
10	Aslan, Omer Aslan and Refik Samet, "A comprehensive review on malware detection approaches," IEEE Vol. 8, pp. 6249-6271, Jan. 2020.
11	Si-haeng Cho, "Trends in response technology and standardization according to the evolution of malicious code," TTA Journal No.118, IT Standard & Test TTA, Jul. 2008.
12	Chen, Tianqi and Carlos Guestrin, "XGBoost: A scalable tree boosting system," Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785-794, Aug. 2016.
13	Hellal, Aya and Lotfi Ben Romdhane, "Minimal contrast frequent pattern mining for malware detection," Computers & Security, Vol. 62 pp. 19-32, Sep. 2016. DOI
14	Han Weijie, Xue Jingfeng, Wang Yong, Huang Lu, Kong Zixiao and Mao Limin, "MalDAE: Detecting and explaining malware based on correlation and fusion of static and dynamic characteristics," Computers & Security, Vol. 83, pp. 208-233, Jun. 2019. DOI
15	Sami, A., Yadegari, B., Rahimi, H., Peiravian, N., Hashemi, S. and Hamze, A., "Malware detection based on mining API calls," Proceedings of the 2010 ACM symposium on applied computing, pp. 1020-1025, Mar. 2010.
16	Qi, Yanjun, "Random forest for bioinformatics," Ensemble machine learning. Springer, Boston, MA, pp. 307-323, Jan. 2012.
17	Dragos Gavrilut, Mihai Cimpoesu, Dan Anton and Liviu Ciortuz, "Malware detection using machine learning," 2009 International Multiconference on Computer Science and Information Technology, IEEE, pp. 735-741, Oct. 2009.
18	M. Shankarapani, K. Kancherla, S. Ramammoorthy, R. Movva and S. Mukkamala, "Kernel machines for malware classification and similarity analysis," The 2010 international joint conference on neural networks(IJCNN). IEEE, pp. 1-6, Jul. 2010.
19	Azmee, ABM.Adnan, Choudhury, Pranto Protim, Alam, Md.Aosaful, Dutta and Orko, "Performance analysis of machine learning classifiers for detecting PE malware," PhD Thesis, Brac University, Dec. 2019.
20	Strumbelj, Erik, and Igor Kononenko, "Explaining prediction models and individual predictions with feature contributions," Knowledge and information systems, Vol. 41, No. 3 pp. 647-665, Dec. 2014. DOI
21	Gi-seung Baek, "Machine learning based malware analysis algorithm suitability study," KISA-WP-2017-0014, KISA, Aug. 2017.

KSCI

Analysis of Malware Group Classification with eXplainable Artificial Intelligence XAI기반 악성코드 그룹분류 결과 해석 연구

Analysis of Malware Group Classification with eXplainable Artificial Intelligence