Correlation Analysis of Dataset Size and Accuracy of the CNN-based Malware Detection Algorithm

Choi, Dong Jun;Lee, Jae Woo;

doi:10.33778/kcsa.2020.20.3.053

융합보안논문지 (Convergence Security Journal)

제20권3호
/
Pages.53-60
/
2020
/
1598-7329(pISSN)
/
2508-7460(eISSN)

한국융합보안학회 (Korea convergence Security Association)

DOI QR Code

CNN Mobile Net 기반 악성코드 탐지 모델에서의 학습 데이터 크기와 검출 정확도의 상관관계 분석

Correlation Analysis of Dataset Size and Accuracy of the CNN-based Malware Detection Algorithm

최동준 (중앙대학교/융합보안학과) ;
이재우 (중앙대학교/산업보안학과)

투고 : 2020.08.03
심사 : 2020.09.27
발행 : 2020.09.30

https://doi.org/10.33778/kcsa.2020.20.3.053 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

현재 4차 산업혁명을 맞이하여 머신러닝과 인공지능 기술이 급속도로 발전하고 있으며 보안 분야에서도 머신러닝 기술을 응용하려는 움직임이 있다. 많은 악성코드가 생성됨에 따라 사람의 힘으로는 모든 악성코드를 탐지하기 어려워지고 있기 때문이다. 이에 따라 학계와 산업계에서는 머신러닝을 통해 악성코드나 네트워크 침입 이벤트를 탐지하는 것에 관한 연구가 활발히 진행되고 있으며 국제 학회와 저널에서는 머신러닝의 한 분야인 딥러닝을 이용한 보안데이터 분석 연구가 논문 발표되고 있다. 그러나 해당 논문들은 검출 정확도에 초점이 맞추어져 있고 검출 정확도를 높이기 위해 여러 파라미터들을 수정하지만 Dataset의 개수를 고려하지 않고 있다. 따라서 본 논문에서는 CNN Mobile net 기반 악성코드 탐지 모델에서 가장 높은 검출 정확도를 도출할 수 있는 Dataset의 개수을 찾아내어 많은 머신러닝 연구 진행에 비용과 리소스를 줄이고자 한다.

At the present stage of the fourth industrial revolution, machine learning and artificial intelligence technologies are rapidly developing, and there is a movement to apply machine learning technology in the security field. Malicious code, including new and transformed, generates an average of 390,000 a day worldwide. Statistics show that security companies ignore or miss 31 percent of alarms. As many malicious codes are generated, it is becoming difficult for humans to detect all malicious codes. As a result, research on the detection of malware and network intrusion events through machine learning is being actively conducted in academia and industry. In international conferences and journals, research on security data analysis using deep learning, a field of machine learning, is presented. have. However, these papers focus on detection accuracy and modify several parameters to improve detection accuracy but do not consider the ratio of dataset. Therefore, this paper aims to reduce the cost and resources of many machine learning research by finding the ratio of dataset that can derive the highest detection accuracy in CNN Mobile net-based malware detection model.

키워드

참고문헌

AV-TEST. Malware [Online]. Available: https://www.avtest.org/en/statistics/malware/. [Accessed: Jun. 30, 2018].
C. Chen, S. Wang, D. Wen, G. Lai and M. Sun, "Applying Convolutional Neural Network for Malware Detection," 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST), Morioka, Japan, 2019, pp. 1-5.
F. Hussain, R. Hussain, S. A. Hassan and E. Hossain, "Machine Learning in IoT Security: Current Solutions and Future Challenges," in IEEE Communications Surveys & Tutorials, doi: 10.1109/COMST.2020.2986444.
Daniele Ucci, Leonardo Aniello, Roberto Baldoni, Survey of machine learning techniques for malware analysis, Computers & Security, Volume 81, 2019, Pages 123-147, ISSN 0167-4048, https://doi.org/10.1016/j.cose.2018.11.001
Krizhevsky, Alex & Sutskever, Ilya & Hinton, Geoffrey. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems. 25. 10.1145/3065386.
W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. Compressing neural networks with the hashing trick. CoRR, abs/1504.04788, 2015.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211-252, 2015 https://doi.org/10.1007/s11263-015-0816-y
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567, 2015
C. Szegedy, S. Ioffe, and V. Vanhoucke. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261, 2016.
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017.
https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md
M. Ganesh, P. Pednekar, P. Prabhuswamy, D. S. Nair, Y. Park and H. Jeon, "CNN-Based Android Malware Detection," 2017 International Conference on Software Security and Assurance (ICSSA), Altoona, PA, 2017, pp. 60-65.
https://teachablemachine.withgoogle.com/

융합보안논문지 (Convergence Security Journal)

CNN Mobile Net 기반 악성코드 탐지 모델에서의 학습 데이터 크기와 검출 정확도의 상관관계 분석

Correlation Analysis of Dataset Size and Accuracy of the CNN-based Malware Detection Algorithm

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)