Browse > Article
http://dx.doi.org/10.9708/jksci.2021.26.11.041

Dimensionality Reduction of Feature Set for API Call based Android Malware Classification  

Hwang, Hee-Jin (Dept. of Computer Science and Engineering, Korea National Defense University)
Lee, Soojin (Dept. of Computer Science and Engineering, Korea National Defense University)
Abstract
All application programs, including malware, call the Application Programming Interface (API) upon execution. Recently, using those characteristics, attempts to detect and classify malware based on API Call information have been actively studied. However, datasets containing API Call information require a large amount of computational cost and processing time. In addition, information that does not significantly affect the classification of malware may affect the classification accuracy of the learning model. Therefore, in this paper, we propose a method of extracting a essential feature set after reducing the dimensionality of API Call information by applying various feature selection methods. We used CICAndMal2020, a recently announced Android malware dataset, for the experiment. After extracting the essential feature set through various feature selection methods, Android malware classification was conducted using CNN (Convolutional Neural Network) and the results were analyzed. The results showed that the selected feature set or weight priority varies according to the feature selection methods. And, in the case of binary classification, malware was classified with 97% accuracy even if the feature set was reduced to 15% of the total size. In the case of multiclass classification, an average accuracy of 83% was achieved while reducing the feature set to 8% of the total size.
Keywords
API-Call; Feature Selection; Dimensionality Reduction; Malware Classification; CNN;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Statcounter, Mobile Operating System Market Share Worldwide, https://gs.statcounter.com/os-market-share/mobile/ worldwide/#monthly-202009-202109
2 Statcounter, Mobile Operating System Market Share Worldwide, https://gs.statcounter.com/os-market-share/mobile/ south-korea/#monthly-202009-202109
3 GDATA, Some 343 new Android malware samples every hour in 2017, https://www.gdatasoftware.com/blog/2018/02/30491 -some-343-new-android-malware-samples-every-hour-in-2017
4 Abir Rahali, Arash Habibi Lashkari, Gurdip Kaur, Laya Taheri, Francois Gagnon, and Frederic Massicotte, "DIDroid: Android Malware Classification and Characterization Using Deep Image Learning", 10th International Conference on Communication and Network Security (ICCNS2020), Pages 70-82, Tokyo, Japan, November 2020, DOI: 10.1145/3442520.3442522   DOI
5 Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D. and Saeed, J. "A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction.", Journal of Applied Science and Technology Trends Vol 1, No. 2, pp.56-70, May 2020. DOI: 10.38094/ jastt1224   DOI
6 Naser Peiravian and Xingquan Zhu, "Machine Learning for Android Malware Detection Using Permission and API Calls", 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, Pages 300-305, Herndon, VA, USA, February 2014, DOI: 10.1109/ICTAI.2013.53   DOI
7 Lucky Onwuzurike, Enrico Mariconti, Panagiotis Andriotis, Emiliano De Cristofaro, Gordon Ross and Gianluca Stringhini, "MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models (Extended Version)", ACM Transactions on Privacy and Security, Volume 22, Issue 2, Article No.: 14, pp 1-34, April 2019, DOI: 10.1145/3313391   DOI
8 David Sean Keyes, Beiqi Li, Gurdip Kaur; Arash Habibi Lashkari, Francois Gagnon and Frederic Massicotte, "EntropLyzer: Android Malware Classification and Characterization Using Entropy Analysis of Dynamic Characteristics", 2021 Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS), pp 1-8, Hamilton, ON, Canada, June 2021, DOI: 10.1109/RDAAPS48126.2021.9452002   DOI
9 Suleiman Y. Yerima and Mohammed K. Alzaylaee, "Machine learning-based dynamic analysis of Android apps with improved code coverage", EURASIP Journal on Information Security 2019, Article No.: 4, April 2019, DOI: 10.1186/s13635-019-0087-1   DOI
10 Laya Taheri, Andi Fitriah Abdul kadir and Arash Habibi Lashkari, "Extensible Android Malware Detection and Family Classification Using Network-Flows and API-Calls", 2019 International Carnahan Conference on Security Technology (ICCST)), pp 1-8, Chennai, India, October 2019, DOI: 10.1109/CCST.2019.8888430   DOI
11 I. Sumaiya Thaseen, Ch. Aswani Kumar and Amir Ahmad, "Integrated Intrusion Detection Model Using Chi-Square Feature Selection and Ensemble of Classifiers", JArabian Journal for Science and Engineering, Volume 44, Issue 4, Pages 3357-3368, August 2018, DOI: 10.1007/s13369-018-3507-5   DOI
12 B Jyothi Kumar, H Naveen, B Praveen Kumar, Sai Shyam Sharma and Jaime Villegas, "Logistic regression for polymorphic malware detection using ANOVA F-test", 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp 1-34, Coimbatore, India, April 2019, DOI: 10.1109/ICIIECS.2017.8275880   DOI
13 N.Hoquea, D.K.Bhattacharyyaa and J.K.Kalitab, "MIFS-ND: A mutual information-based feature selection method", Expert Systems with Applications, Volume 41, Issue 14, Pages 6371-6385, October 2014, DOI: 10.1016/j.eswa.2014.04.019   DOI
14 AndreaDe Lorenzo, FabioMartinelli, Eric Medvet, Francesco Mercaldo and Antonella Santone, "Visualizing the outcome of dynamic analysis of Android malware with VizMal", Journal of Information Security and Applications, Volume 50, February 2020, DOI: 10.1016/j.jisa.2019.102423   DOI
15 MatthiasReifa and FaisalShafaitb, "Efficient feature size reduction via predictive forward selection", Pattern Recognition, Volume 47, Issue 4, Pages 1664-1673, April 2014, DOI: 10.1016/j.patcog.2013.10.009   DOI
16 Hoai Bach Nguyen, Bing Xue, Ivy Liu and Mengjie Zhang, "Filter based backward elimination in wrapper based PSO for feature selection in classification", 2014 IEEE Congress on Evolutionary Computation (CEC), Page(s):3111 - 3118, Beijing, China, September 2014, DOI: 10.1109/CEC.2014.6900657   DOI
17 R Muthukrishnan and R Rohini, "LASSO: A feature selection technique in predictive modeling for machine learning", 2016 IEEE International Conference on Advances in Computer Applications (ICACA), Page(s):18 - 20, Coimbatore, India, March 2017, DOI: 10.1109/ICACA.2016.7887916   DOI
18 Saurabh Paul and Petros Drineas, "Feature Selection for Ridge Regression with Provable Guarantees", Neural Computation, Volume: 28, Issue: 4, Page(s): 716 - 742, April 2016, DOI: 10.1162/NECO_a_00816   DOI
19 Hui Zou and Trevor Hastie, "Regularization and variable selection via the elastic net", Journal of the Royal Statistical Society Series B (Statistical Methodology), Volume: 67, Issue: 2, Pages: 301-320, March 2005, DOI: 10.1111/j.1467-9868.2005.00503.x   DOI
20 Howida Abubaker, Aida Ali, Siti Mariyam Shamsuddin and Shafaatunnur Hassan, "Exploring permissions in android applications using ensemble-based extra tree feature selection" Indonesian Journal of Electrical Engineering and Computer Science, Vol 19, No 1, Pages: 543-552, July 2020, DOI: 10.11591/ijeecs.v19.i1.pp543-552   DOI