Browse > Article
http://dx.doi.org/10.3745/JIPS.03.0058

A Chi-Square-Based Decision for Real-Time Malware Detection Using PE-File Features  

Belaoued, Mohamed (Dept. of Computer Science, Univsersite 20 Aout 1955)
Mazouzi, Smaine (Dept. of Computer Science, Univsersite 20 Aout 1955)
Publication Information
Journal of Information Processing Systems / v.12, no.4, 2016 , pp. 644-660 More about this Journal
Abstract
The real-time detection of malware remains an open issue, since most of the existing approaches for malware categorization focus on improving the accuracy rather than the detection time. Therefore, finding a proper balance between these two characteristics is very important, especially for such sensitive systems. In this paper, we present a fast portable executable (PE) malware detection system, which is based on the analysis of the set of Application Programming Interfaces (APIs) called by a program and some technical PE features (TPFs). We used an efficient feature selection method, which first selects the most relevant APIs and TPFs using the chi-square ($KHI^2$) measure, and then the Phi (${\varphi}$) coefficient was used to classify the features in different subsets, based on their relevance. We evaluated our method using different classifiers trained on different combinations of feature subsets. We obtained very satisfying results with more than 98% accuracy. Our system is adequate for real-time detection since it is able to categorize a file (Malware or Benign) in 0.09 seconds.
Keywords
Chi-Square Test; Malware Analysis; PE-Optional Header; Real-Time Detection Windows API;
Citations & Related Records
연도 인용수 순위
  • Reference
1 L. Breiman, "Random forest," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.   DOI
2 A. Singh and A. Lakhotia, "Game-theoretic design of an information exchange model for detecting packed malware," in Proceedings of the 6th International Conference on Malicious and Unwanted Software (MALWARE), Fajardo, Puerto Rico, 2011, pp. 1-7.
3 W3Schools, "OS platform statistics," [Online]. http://www.w3schools.com/browsers/browsers_os.asp.
4 H. Toderici and M. Stamp, "Chi-squared distance and metamorphic virus detection," Journal of Computer Virology and Hacking Techniques, vol. 9, no. 1, pp. 1-14, 2013.   DOI
5 Y. Ding, X. Yuan, K. Tang, X. Xiao, and Y. Zhang, "A fast malware detection algorithm based on objectiveoriented association mining," Computers & Security, vol. 39, pp. 315-324, 2013.   DOI
6 M. K. Shankarpani, K. Kancherla, R. Movva, and S. Mukkamala, "Computational intelligent techniques and similarity measures for malware classification," in Computational Intelligence for Privacy and Security. Heidelberg: Springer, 2012, pp. 215-236.
7 M. G. Schultz, E. Eskin, F. Zadok, and S. J. Stolfo, "Data mining methods for detection of new malicious executables," in Proceedings of the IEEE Symposium onSecurity and Privacy, Oakland, CA, 2001, pp. 38-49.
8 P. Fornasini, "The chi square test," in The Uncertainty in Physical Measurements. New York: Springer, 2008, pp. 187-198.
9 H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed. Burlington, MA: Morgan Kaufmann Publishers, 2011.
10 pefile [Online], http://code.google.com/p/pefile/.
11 S. Kokoska and C. Nevison, Statistical Tables and Formulae. New York: Springer, 1989.
12 "Anti-malware vendors slow to respond," Computer Fraud & Security, vol. 2010, no. 6, pp. 1-2, 2010.   DOI
13 Y. Ye, T. Li, Q. Jiang, and Y. Wang, "CIMDS: adapting postprocessing techniques of associative classification for malware detection," IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications and Reviews, vol. 40, no. 3, pp. 298-307, 2010.   DOI
14 A. Shabtai, R. Moskovitch, Y. Elovici, and C. Glezer, "Detection of malicious code by applying machine learning classifiers on static features: a state-of-the-art survey," Information Security Technical Report, vol. 14, no. 1, pp. 16-29, 2009.   DOI
15 Y. Ye, D. Wang, T. Li, D. Ye, and Q. Jiang, "An intelligent PE-malware detection system based on association mining," Journal in Computer Virology, vol. 4, no. 4, pp. 323-334, 2008.   DOI
16 Z. Salehi, A. Sami, and M. Ghiasi, "Using feature generation from API calls for malware detection," Computer Fraud & Security, vol. 2014, no. 9, pp. 9-18, 2014.   DOI
17 Z. Bazrafshan, H. Hashemi, S. M. H. Fard, and A. Hamzeh, "A survey on heuristic malware detection techniques," in Proceedings of the 5th Conference on Information and knowledge Technology (IKT), Shiraz, Iran, 2013, pp. 113-120.
18 M. Z. Shafiq, S. M. Tabish, F. Mirza, and M. Farooq, "Pe-miner: mining structural information to detect malicious executables in realtime," in Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection, Saint-Malo, France, 2009, pp. 121-141.
19 McAfee threat report: first quarter 2013 [Online]. Available: http://www.mcafee.com/uk/resources/reports/rpquarterly-threat-q1-2013.pdf.
20 C. V. Zhou, C. Leckie, and S. Karunasekera, "A survey of coordinated attacks and collaborative intrusion detection," Computers & Security, vol. 29, no. 1, pp.124-140, 2010.   DOI
21 McAfee Labs threats report: first quarter 2014 [Online]. Available: http://www.mcafee.com/uk/resources/reports/rp-quarterly-threat-q1-2014.pdf.
22 J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, "Rotation forest: A new classifier ensemble method," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, 2006.   DOI
23 C. Wang, J. Pang, R. Zhao, and X. Liu, "Using API sequence and Bayes algorithm to detect suspicious behavior," in Proceedings of the International Conference on Communication Software and Network, Macau, China, 2009, pp. 544-548.
24 M. Pietrek, "Peering inside the PE: a tour of the win32 portable executable file format," Microsoft Systems Journal, vol. 9, no. 3, pp. 15-38, 1994.
25 B. Chedzoy, "Phi-coefficient," in Encyclopedia of Statistical Sciences, 2nd ed. Hoboken, NJ: Wiley, 2006.
26 D. P. Farrington and R. Loeber, "Relative improvement over chance (RIOC) and phi as measures of predictive efficiency and strength of association in 2x2 tables," Journal of Quantitative Criminology, vol. 5, no. 3, pp. 201-213, 1989.   DOI