Browse > Article
http://dx.doi.org/10.33778/kcsa.2020.20.3.053

Correlation Analysis of Dataset Size and Accuracy of the CNN-based Malware Detection Algorithm  

Choi, Dong Jun (중앙대학교/융합보안학과)
Lee, Jae Woo (중앙대학교/산업보안학과)
Publication Information
Abstract
At the present stage of the fourth industrial revolution, machine learning and artificial intelligence technologies are rapidly developing, and there is a movement to apply machine learning technology in the security field. Malicious code, including new and transformed, generates an average of 390,000 a day worldwide. Statistics show that security companies ignore or miss 31 percent of alarms. As many malicious codes are generated, it is becoming difficult for humans to detect all malicious codes. As a result, research on the detection of malware and network intrusion events through machine learning is being actively conducted in academia and industry. In international conferences and journals, research on security data analysis using deep learning, a field of machine learning, is presented. have. However, these papers focus on detection accuracy and modify several parameters to improve detection accuracy but do not consider the ratio of dataset. Therefore, this paper aims to reduce the cost and resources of many machine learning research by finding the ratio of dataset that can derive the highest detection accuracy in CNN Mobile net-based malware detection model.
Keywords
CNN Mobile Net; Malware Detection Algoritm; Machine Learning; Security Data Analysis; Network Event;
Citations & Related Records
연도 인용수 순위
  • Reference
1 AV-TEST. Malware [Online]. Available: https://www.avtest.org/en/statistics/malware/. [Accessed: Jun. 30, 2018].
2 C. Chen, S. Wang, D. Wen, G. Lai and M. Sun, "Applying Convolutional Neural Network for Malware Detection," 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST), Morioka, Japan, 2019, pp. 1-5.
3 F. Hussain, R. Hussain, S. A. Hassan and E. Hossain, "Machine Learning in IoT Security: Current Solutions and Future Challenges," in IEEE Communications Surveys & Tutorials, doi: 10.1109/COMST.2020.2986444.
4 Daniele Ucci, Leonardo Aniello, Roberto Baldoni, Survey of machine learning techniques for malware analysis, Computers & Security, Volume 81, 2019, Pages 123-147, ISSN 0167-4048,   DOI
5 Krizhevsky, Alex & Sutskever, Ilya & Hinton, Geoffrey. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems. 25. 10.1145/3065386.
6 W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. Compressing neural networks with the hashing trick. CoRR, abs/1504.04788, 2015.
7 O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211-252, 2015   DOI
8 K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014
9 C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567, 2015
10 C. Szegedy, S. Ioffe, and V. Vanhoucke. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261, 2016.
11 K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015
12 A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017.
13 https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md
14 M. Ganesh, P. Pednekar, P. Prabhuswamy, D. S. Nair, Y. Park and H. Jeon, "CNN-Based Android Malware Detection," 2017 International Conference on Software Security and Assurance (ICSSA), Altoona, PA, 2017, pp. 60-65.
15 https://teachablemachine.withgoogle.com/