DOI QR코드

DOI QR Code

Feature Selection to Mine Joint Features from High-dimension Space for Android Malware Detection

  • Xu, Yanping (School of Cyberspace Security, Beijing University of Posts and Telecommunications) ;
  • Wu, Chunhua (School of Cyberspace Security, Beijing University of Posts and Telecommunications) ;
  • Zheng, Kangfeng (School of Cyberspace Security, Beijing University of Posts and Telecommunications) ;
  • Niu, Xinxin (School of Cyberspace Security, Beijing University of Posts and Telecommunications) ;
  • Lu, Tianling (School of Information Technology and Network Security, People's Public Security University of China)
  • Received : 2016.11.23
  • Accepted : 2017.06.03
  • Published : 2017.09.30

Abstract

Android is now the most popular smartphone platform and remains rapid growth. There are huge number of sensitive privacy information stored in Android devices. Kinds of methods have been proposed to detect Android malicious applications and protect the privacy information. In this work, we focus on extracting the fine-grained features to maximize the information of Android malware detection, and selecting the least joint features to minimize the number of features. Firstly, permissions and APIs, not only from Android permissions and SDK APIs but also from the developer-defined permissions and third-party library APIs, are extracted as features from the decompiled source codes. Secondly, feature selection methods, including information gain (IG), regularization and particle swarm optimization (PSO) algorithms, are used to analyze and utilize the correlation between the features to eliminate the redundant data, reduce the feature dimension and mine the useful joint features. Furthermore, regularization and PSO are integrated to create a new joint feature mining method. Experiment results show that the joint feature mining method can utilize the advantages of regularization and PSO, and ensure good performance and efficiency for Android malware detection.

Keywords

References

  1. W. Enck, M. Ongtang, P. McDaniel, "On lightweight mobile phone application certification," in Proc. of the 16th ACM conference on Computer and communications security, ACM, pp.235-245, November 9-13, 2009.
  2. W. Enck, P. Gilbert, S. Han, et al, "TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones," ACM Transactions on Computer Systems (TOCS), vol.32, no.2, pp.5-21, March, 2014.
  3. A. P. Felt, E. Chin, S. Hanna, et al, "Android permissions demystified," in Proc. of the 18th ACM conference on Computer and communications security, pp.627-638, October 17-21, 2011.
  4. I. Burguera, U. Zurutuza, S. Nadjm-Tehrani, "Crowdroid: behavior-based malware detection system for Android," in Proc. of the 1st ACM workshop on Security and privacy in smartphones and mobile devices, pp.15-26, October 17-21, 2011.
  5. L. Cen, C. S. Gates, L. Si, et al, "A probabilistic discriminative model for Android malware detection with decompiled source code," IEEE Transactions on Dependable and Secure Computing, vol.12, no.4, pp.400-412, July-Augest. 1, 2015. https://doi.org/10.1109/TDSC.2014.2355839
  6. N. Peiravian, X. Zhu, "Machine learning for Android malware detection using permission and api calls," in Proc. of 2013 IEEE 25th International Conference on Tools with Artificial Intelligence (ICTAI), pp.300-305, November 7-9, 2013.
  7. J. Crussell, C. Gibler, H. Chen, "Attack of the clones: Detecting cloned applications on android markets," in Proc. of Computer Security-ESORICS 2012, pp.37-54, September 10-12, 2012.
  8. Q. Gu, Z. Li, J. Han, "Joint feature selection and subspace learning," in Proc. of IJCAI Proceedings-International Joint Conference on Artificial Intelligence, pp. 1294-1300, July 16-22, 2011.
  9. J. Wen, Z. Lai, Y. Zhan, et al, "The $L_{1,2}$ -norm-based unsupervised optimal feature selection with applications to action recognition," Pattern Recognition, vol.60, pp. 515-530, June, 2016. https://doi.org/10.1016/j.patcog.2016.06.006
  10. M. A. Hall, "Correlation-based feature selection for machine learning," The University of Waikato, April, 1999.
  11. H. Liu, H. Motoda, R. Setiono, et al, "Feature selection: An ever evolving frontier in data mining," in Proc. of FSDM 2010 : International Workshop on Feature Selection in Data Mining, pp.4-13, January 20-21, 2010.
  12. F. Nie, H. Huang, X. Cai, et al, "Efficient and robust feature selection via joint $L_{1,2}$ -norms minimization," in Proc. of Advances in neural information processing systems, pp.1813-1821, December 6-9, 2010.
  13. K. Huang, I. King, M. R. Lyu, "Direct zero-norm optimization for feature selection," in Proc. of ICDM workshops 2008. Eight IEEE international conference on data mining workshops, pp.845-850, December 15-19, 2008.
  14. A. Y. Yang, S. S. Sastry, A. Ganesh, et al, "Fast $L_{1}$ -minimization algorithms and an application in robust face recognition: A review," in Proc. of 2010 IEEE International Conference on Image Processing, pp.1849-1852, May 23, 2010.
  15. Y. Aafer, W. Du, H. Yin, "DroidAPIMiner: Mining API-level features for robust malware detection in Android," in Proc. of International Conference on Security and Privacy in Communication Systems, pp.86-103, September 25-28, 2013.
  16. Y. Wang, J. Zheng, C. Sun, et al. "Quantitative security risk assessment of Android permissions and applications," in Proc. of Data and Applications Security and Privacy XXVII, pp.226-241, July 15-17, 2013.
  17. F. Ali, B. A. Nor, S. Rosli, W. A. W. Ainuddin, "A review on feature selection in mobile malware detection," Digital Investigation, vol.13, pp.22-37, March, 2015. https://doi.org/10.1016/j.diin.2015.02.001
  18. B. Sanz, I. Santos, C. Laorden, et al. "Puma: Permission usage to detect malware in Android," in Proc. of International Joint Conference CISIS'12-ICEUTE' 12-SOCO' 12 Special Sessions, pp.289-298, September, 2013.
  19. H. Peng, C. Gates, B. Sarma, et al, "Using probabilistic generative models for ranking risks of Android apps," in Proc. of the 2012 ACM conference on Computer and communications security, pp.241-252, October 16-18, 2012.
  20. M. Zhang, Y. Duan, H. Yin, et al, "Semantics-aware Android malware classification using weighted contextual API dependency graphs," in Proc. of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp.1105-1116, November 3-7, 2014.
  21. S. Arzt, S. Rasthofer, C. Fritz, et al, "Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps," ACM SIGPLAN Notices, vol.49, no.6, pp.259-269, June, 2014. https://doi.org/10.1145/2666356.2594299
  22. Y. Cao, Y. Fratantonio, A. Bianchi, et al, "EdgeMiner: Automatically detecting implicit control flow transitions through the Android framework," in Proc. of 2015 Network and Distributed System Security (NDSS) Symposium, February 8-11, 2015.
  23. K. O. Elish, X. Shu, D. D. Yao, et al, "Profiling user-trigger dependence for Android malware detection," Computers & Security, vol.49, pp.255-273, November, 2015. https://doi.org/10.1016/j.cose.2014.11.001
  24. D. Arp, M. Spreitzenbarth, M. Hubner, et al, "DREBIN: Effective and explainable detection of Android malware in your pocket," in Proc. of 2014 Network and Distributed System Security (NDSS) Symposium, February 23-26, 2014.
  25. D. J. Wu, C. H. Mao, T. E. Wei, et al, "Droidmat: Android malware detection through manifest and api calls tracing," in Proc. of 2012 Seventh Asia Joint Conference on Information Security (Asia JCIS), pp.62-69, August 09-10, 2012.
  26. M. Grace, Y. Zhou, Q. Zhang, et al, "Riskranker: scalable and accurate zero-day Android malware detection," in Proc. of the 10th international conference on Mobile systems, applications, and services, pp.281-294, June 25-29, 2012.
  27. W. Yang, X. Xiao, B. Andow, et al, "Appcontext: Differentiating malicious and benign mobile app behaviors using context," in Proc. of 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE), pp.303-313, May 16-24, 2015.
  28. J. Krawczuk, T. Łukaszuk, "The feature selection bias problem in relation to high-dimensional gene data," Artificial intelligence in medicine, vol.66, pp.63-71, November, 2016. https://doi.org/10.1016/j.artmed.2015.11.001
  29. H. S. Ham, M. J. Choi, "Analysis of android malware detection performance using machine learning classifiers," in Proc. of 2013 International Conference on ICT Convergence (ICTC), pp.490-495, October 14-16, 2013.
  30. A. Shabtai, Y. Fledel, Y. Elovici, "Automated static code analysis for classifying android applications using machine learning," in Proc. of 2010 International Conference on Computational Intelligence and Security (CIS), pp.329-333, January, 2010.
  31. X. Wang, J. Yang, X. Teng, et al, "Feature selection based on rough sets and particle swarm optimization," Pattern Recognition Letters, vol. 28, no.4, pp.459-471, November, 2007. https://doi.org/10.1016/j.patrec.2006.09.003
  32. B. Xue, M. Zhang, W. N. Browne, "Particle swarm optimization for feature selection in classification: a multi-objective approach," IEEE transactions on cybernetics, vol. 43, no.6, pp.1656-1671, December, 2013. https://doi.org/10.1109/TSMCB.2012.2227469
  33. P. Wei, Q. Hu, P. Ma, et al. "Robust feature selection based on regularized brownboost loss," Knowledge-Based Systems, vol.54, pp.180-198, September, 2013. https://doi.org/10.1016/j.knosys.2013.09.005
  34. S. H. Seo, A. Gupta, A. M. Sallam, et al, "Detecting mobile malware threats to homeland security through static analysis," Journal of Network and Computer Applications, vol.38, pp.43-53, June, 2014. https://doi.org/10.1016/j.jnca.2013.05.008
  35. R. He, T. Tan, L. Wang, et al., " $L_{1,2}$ regularized correntropy for robust feature selection," in Proc. of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2504-2511, June 16-21, 2012.
  36. Y. Yang, J. O. Pedersen, "A comparative study on feature selection in text categorization," in Proc. of the Fourteenth International Conference on Machine Learning (ICML 1997), pp.412-420, July 8-12, 1997.
  37. Y. Yang, H. T. Shen, Z. Ma, et al, " $L_{1,2}$ -norm regularized discriminative feature selection for unsupervised learning," in Proc. of IJCAI Proceedings-International Joint Conference on Artificial Intelligence, pp.1589-1595, July 16-22, 2011.
  38. A. Y. Ng, "Feature selection $L_{1}$ vs. $L_{2}$ regularization, and rotational invariance," in Proc. of the twenty-first international conference on Machine learning, pp.78-86, July 4-8, 2004.
  39. J. Wen, Z. Lai, Y. Zhan, et al, "The $L_{2,1}$ -norm-based unsupervised optimal feature selection with applications to action recognition," Pattern Recognition, vol.60, pp.515-530, June, 2016. https://doi.org/10.1016/j.patcog.2016.06.006
  40. C. X. Ren, D. Q. Dai, H. Yan, "Robust classification using $L_{2,1}$ -norm based regression model," Pattern Recognition, vol.45, no.7, pp.2708-2718, January, 2012. https://doi.org/10.1016/j.patcog.2012.01.003
  41. J. Kennedy, "Particle swarm optimization," Encyclopedia of machine learning, pp.760-766, May, 2011.
  42. R. C. Eberhart, J. Kennedy, "A discrete binary version of the particle swarm algorithm," in Proc. of the IEEE Conference on Systems, Man and Cybernetics, pp.4104-4109, September 3-5, 1997.
  43. R. Kohavi, "A study of cross-validation and bootstrap for accuracy estimation and model selection," in Proc. of the Fourteenth International Joint Conference on Artificial Intelligence, pp.1137-1145, August 20-25, 1995.
  44. Cross validation. http://scikit-learn.org/stable/modules/cross_validation.html#cross-validation, August, 2013.