Browse > Article
http://dx.doi.org/10.5351/KJAS.2017.30.6.851

Malware classification using statistical techniques  

Won, Sungmin (Department of Statistics, Ewha Womans University)
Kim, Hyunjoo (Department of Statistics, Ewha Womans University)
Song, Jongwoo (Department of Statistics, Ewha Womans University)
Publication Information
The Korean Journal of Applied Statistics / v.30, no.6, 2017 , pp. 851-865 More about this Journal
Abstract
Ransomware such as WannaCry is a global issue and methods to defend against malware attacks are important. We have to be able to classify the malware types efficiently in order to minimize the damage from malwares. This study makes models to classify malware properly with various statistical techniques. Several classification techniques such as logistic regression, random forest, gradient boosting, and support vector machine are used to construct models. This study also helps us understand key variables to classify the type of malicious software.
Keywords
malicious software; multi-class classification; random forest; gradient boosting; support vector machine; important variables;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Brieman, L. (2001). Random forests, Machine Learning, 45, 5-32.   DOI
2 Brieman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and Regression Trees, Chapman and Hall, New York.
3 Chen, L. and Aritsugi, M. (2006). An SVM-Based Masquerade Dection Method with Online Update Using Co-occurrence Matrix, DIMVA 2006, LNCS 4064, 37-53.
4 Choi, J., Kim, H., Kim, K., Park, H., and Song, J. (2014). A study on extraction of optimized API sequence length and combination for efficient malware classification, Journal of The Korea Institute of Information Security & Cryptology, 24, 897-909.   DOI
5 Cortes, C. and Vapnik, V. (1995). Support-vector networks, Machine Learning, 20, 273-297.
6 Dahl, G. E., Stokes, J, W., Deng, L., and Yu, D. (2013). LARGE-SCALE MALWARE CLASSIFICATION USING RANDOM PROJECTIONS AND NEURAL NET WORKS, Acoustics, Speech and Processing (ICASSP), IEEE.
7 Friedman, J. (2002). Stochastic gradient boosting, Computational Statistics & Data Analysis, 38, 367-378.   DOI
8 Han, S., Lee, K., and Lee, S. (2009). Packed PE file detection for Malware forensics, 2nd International Conference on Computer Science and its Applications, CSA.
9 Kim, M., Lee, J., Chang, H., Cho, S., and Park, Y. (2010). Design and performance evaluation of binary code packing for protecting embedded software against reverse engineering, In 13th IEEE International Symposium, (ISORC), 80-86.
10 Konrad, R. (2011). Automatic analysis of malware behavior using machine learning, Journal of Computer Security, 19, 639-668.   DOI
11 Kwon, H., Kim, S., and Im, E. (2012). An Malware classification system using multi N-gram, Journal of Security Engineering, 9, 531-542.
12 Lyda, R. and Hamrock, J. (2007). Using entropy analysis to find encrypted and packed malware, IEEE Security & Privacy, 5.
13 Ridgeway, G. (2007). Generalized Boosted Models: A guide to the gbm package, https://cran.r-project.org/web/packages/gbm/
14 Runwal, N., Low, R. M., and Stamp, M. (2012). Opcode graph similarity and metamorphic detection, Journal in Computer Virology, 8, 37-52.   DOI
15 Santos, I., Penya, Y. K., Devesa, J., and Bringas, P. G. (2009). N-grams-based file signatures for malware detection, 11th International Conference on Enterprise Information Systems (ICEIS), AIDSS, 317-320.