Browse > Article
http://dx.doi.org/10.9708/jksci.2022.27.11.123

Light-weight Classification Model for Android Malware through the Dimensional Reduction of API Call Sequence using PCA  

Jeon, Dong-Ha (Dept. of Defense Science, Korea National Defense University)
Lee, Soo-Jin (Dept. of Defense Science, Korea National Defense University)
Abstract
Recently, studies on the detection and classification of Android malware based on API Call sequence have been actively carried out. However, API Call sequence based malware classification has serious limitations such as excessive time and resource consumption in terms of malware analysis and learning model construction due to the vast amount of data and high-dimensional characteristic of features. In this study, we analyzed various classification models such as LightGBM, Random Forest, and k-Nearest Neighbors after significantly reducing the dimension of features using PCA(Principal Component Analysis) for CICAndMal2020 dataset containing vast API Call information. The experimental result shows that PCA significantly reduces the dimension of features while maintaining the characteristics of the original data and achieves efficient malware classification performance. Both binary classification and multi-class classification achieve higher levels of accuracy than previous studies, even if the data characteristics were reduced to less than 1% of the total size.
Keywords
API-Call; PCA; Dimensional Reduction; LGBM; RF; KNN;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 W. Subhash, L. Parashar, and U. Singh. "Intrusion detection system using PCA with random forest approach", 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), IEEE, pp.803-808, Aug. 2020, DOI: 10.1109/ICESC48915.2020.9155656   DOI
2 Dissanayake, Maheshi B. "Feature Engineering for Cyber-attack detection in Internet of Things.", I.J Wireless and Microwave Technologies, Vol. 6, pp.46-54, Dec. 2021, DOI: 10.5815/ijwmt.2021.06.05.   DOI
3 A. Rahali, A. H. Lashkari, G. Kaur, L. Taheri, F. Gagnon, and F. Massicotte, "DIDroid: Android Malware Classification and Characterization Using Deep Image Learning", Proc. of the 10th International Conference on Communication and Network Security (ICCNS2020), pp.70-82, Nov. 2020, DOI: 10.1145/3442520.3442522   DOI
4 Statista Research Department, Global market share smartphone operating systems of unit shipments 2014-2023, https:// www.statista.com/statistics/272307/market-share-forecast-forsmartphone-operating-systems/
5 N. Peiravian and X. Zhu, "Machine Learning for Android Malware Detection Using Permission and API Calls", Proc. of the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, pp.300-305, Feb. 2014, DOI: 10.1109/ICTAI.2013.53   DOI
6 A. D. Lorenzo, F. Martinelli, E. Medvet, F. Mercaldo and A. Santone, "Visualizing the outcome of dynamic analysis of Android malware with VizMal", Journal of Information Security and Applications, Vol. 50, Feb. 2020, DOI: 10.1016/j.jisa.2019.102423   DOI
7 D. S. Keyes, B. Li, G. Kaur, A. H. Lashkari, F. Gagnon and F. Massicotte, "EntropLyzer: Android Malware Classification and Characterization Using Entropy Analysis of Dynamic Characteristics", Proc. of the 2021 Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS), pp.1-8, May. 2021, DOI: 10.1109/RDAAPS48126.2021.9452002   DOI
8 Hee-Jin Hwang and Soojin Lee, "Dimensionality Reduction of Feature Set for API Call based Android Malware Classification", Journal of The Korea Society of Computer and Information, Vol. 26, No. 11, pp.41-49, Nov. 2010, DOI: 10.9708/jksci.2021.26.11.041   DOI
9 L. Shilpa, J. Sini, and V. Bhupendra, "Feature Reduction using Principal Component Analysis for Anomaly-Based Intrusion Detection on NSL-KDD", International Journal of Engineering Science and Technology, Vol. 2, No. 6, pp.1790-1799, July. 2010, DOI: 10.1.1.168.1957   DOI
10 Hyoseon Kyew and Minhae Kwon, "PCA-Based Low-Complexity Anomaly", KCIS, Vol. 46, No. 6, pp.941-955, June. 2021, DOI: 10.7840/kics.2021.46.6.941   DOI
11 Y. Liu, L. Zhang, and Y. Guan, "Sketch-based streaming PCA algorithm for network-wide traffic anomaly detection ", 2010 IEEE 30th International Conference on Distributed Computing Systems, pp.807-816, Jun. 2010, DOI: 10.1109/ ICDCS.2010245   DOI
12 Statcounter, Mobile Operating System Market Share Worldwide, https://gs.statcounter.com/os-market-share/mobile/south-korea/#monthly-202108-202208
13 Zimperium, Financially Motivated Mobile Scamware Exceeds 100M Installations, https://blog.zimperium.com/dark-herringandroid-.scamware-exceeds-100m-installations
14 H. Abdi and L. J. Williams, Principal component analysis, Wiley interdisciplinary reviews: computational statistics 2 (4), 433-459, 2010.   DOI