Browse > Article
http://dx.doi.org/10.13089/JKIISC.2020.30.4.537

Compiler Analysis Framework Using SVM-Based Genetic Algorithm : Feature and Model Selection Sensitivity  

Hwang, Cheol-Hun (Gachon University)
Shin, Gun-Yoon (Gachon University)
Kim, Dong-Wook (Gachon University)
Han, Myung-Mook (Gachon University)
Abstract
Advances in detection techniques, such as mutation and obfuscation, are being advanced with the development of malware technology. In the malware detection technology, unknown malware detection technology is important, and a method for Malware Authorship Attribution that detects an unknown malicious code by identifying the author through distributed malware is being studied. In this paper, we try to extract the compiler information affecting the binary-based author identification method and to investigate the sensitivity of feature selection, probability and non-probability models, and optimization to classification efficiency between studies. In the experiment, the feature selection method through information gain and the support vector machine, which is a non-probability model, showed high efficiency. Among the optimization studies, high classification accuracy was obtained through feature selection and model optimization through the proposed framework, and resulted in 48% feature reduction and 53 faster execution speed. Through this study, we can confirm the sensitivity of feature selection, model, and optimization methods to classification efficiency.
Keywords
Authorship Attribution; linear-chain Conditional Random Field; Genetic Algorithm; Support Vector Machine;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Y. Ye, T. Li, D. Adjeroh, and S.S. Iyengar, "A survey on malware detection using data mining techniques," ACM Computing Surveys (CSUR), vol. 50, no. 2, pp. 1-41, Jun. 2017.
2 Su-jeong Kim, Ji-hee Ha, Soo-hyun Oh, and Tae-jin Lee, "A Study on Malware Identification System Using Static Analysis Based Machine Learning Technique," Journal of the Korea Institute of Information Security & Cryptology, vol. 29, no. 4, pp. 775-784, Aug. 2019.   DOI
3 E.H. Spafford, and S.A. Weeber, "Software forensics: Can we track code to its authors?," Computers & Security, vol. 12, no. 6, pp. 585-595, Feb. 1993.   DOI
4 E. Stamatatos, "A survey of modern authorship attribution methods," Journal of the American Society for information Science and Technology, vol. 60, no. 3, pp. 538-556, Mar. 2009.   DOI
5 A. Rahimian, P. Shirani, S. Airbaee, and L. Wang, "Bincomp: A stratified approach to compiler provenance attribution," Digital Investigation, vol. 14, pp. 146-155, Aug. 2015.
6 S. Alrabaee, N. Saleem, S. Preda, L. Wang, and M. Debbabi, "OBA2: An Onion approach to Binary code Authorship Attributio," Digital Investigation, vol. 11, pp. 94-103, Mar. 2014.
7 S. Alrabaee, P. Shirani, M. Debbabi, and L. Wang, "On the Feasibility of Malware Authorship Attribution," International Symposium on Foundations and Practice of Security. Springer, pp. 256-272, Jan. 2017.
8 N.E. Rosenblum, X. Zhu, and B.P. Miller, "Learning to Analyze Binary Computer Code," Proceedings of the the Twenty-Third AAAI Conference on Artificial Intelligence, pp. 798-804, Jul. 2008.
9 Dong-Seong Kim, Ha-Nam Nguyen, and Jong-Sou Park, "Genetic algorithm to improve SVM based network intrusion detection system," In 19th International Conference on Advanced Information Networking and Applications, vol. 1, pp. 155-158, 2005.
10 N.E. Rosenblum, X. Zhu, and B.P. Miller, "Who wrote this code? identifying the authors of program binaries," European Symposium on Research in Computer Security. Springer, Berlin, Heidelberg, pp. 172-189, 2011.
11 N. Milosevic, A. Dehghantanha, and K. K. R. Choo, "Machine learning aided Android malware classification," Computers & Electrical Engineering, vol. 61, pp. 266-274, Oct. 2017.   DOI
12 D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, and K. Rieck, "DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket," In Ndss, vol. 14, pp. 23-26, 2014.
13 M. N. Yusoff, and A. Jantan, "Optimizing Decision Tree in Malware Classification System by using Genetic Algorithm Naive Bayes : Sensitive to the correlated attributes," computing, vol. 10, no. 14, pp. 694-713, 2011.
14 N.E. Rosenblum, B.P. Miller, and X. Zhu, "Extracting compiler provenance from program binaries," Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. ACM, pp. 21-28, Jun. 2010.