Browse > Article
http://dx.doi.org/10.13089/JKIISC.2022.32.2.201

Comparison of Anomaly Detection Performance Based on GRU Model Applying Various Data Preprocessing Techniques and Data Oversampling  

Yoo, Seung-Tae (Dept. of Knowledge Information Engineering)
Kim, Kangseok (Dept. of Cyber Security, Ajou University)
Abstract
According to the recent change in the cybersecurity paradigm, research on anomaly detection methods using machine learning and deep learning techniques, which are AI implementation technologies, is increasing. In this study, a comparative study on data preprocessing techniques that can improve the anomaly detection performance of a GRU (Gated Recurrent Unit) neural network-based intrusion detection model using NGIDS-DS (Next Generation IDS Dataset), an open dataset, was conducted. In addition, in order to solve the class imbalance problem according to the ratio of normal data and attack data, the detection performance according to the oversampling ratio was compared and analyzed using the oversampling technique applied with DCGAN (Deep Convolutional Generative Adversarial Networks). As a result of the experiment, the method preprocessed using the Doc2Vec algorithm for system call feature and process execution path feature showed good performance, and in the case of oversampling performance, when DCGAN was used, improved detection performance was shown.
Keywords
Anomaly Detection; Preprocessing; Oversampling; GRU; DCGAN;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Kyung-hyun Han and Seong-oun Hwang, "Development of firewall system for automated policy rule generation based on machine learning," Journal of The Institute of Internet, Broadcasting and Communication, 20(2), pp. 29-37, April 2020.   DOI
2 K. Rahul-Vigneswaran, P. Poornachandran, and KP. Soman, "Acompendium on network and host based intrusion detection systems," Proceedings of the 1st International Conference on Data Science, Machine Learning and Applications, pp. 23-30, May 2020.
3 Yun-gyung Cheong, Ki-namPark,Hyun-joo Kim, Jong-hyun Kim and Sang-won Hyun, "Machine learning based intrusion detection systems for class imbalanced datasets," Electronics and Telecommunications Research Institute, 27(6), pp. 1385-1395, Dec. 2017.
4 A. Radford, L. Metz and S. Chintala,"Unsupervised representation learning with deep convolutional generative adversarial networks," International Conference on Learning Representations, pp. 1-16, Jan. 2016.
5 S. Mishra, "Handling imbalanced data: SMOTE vs. random under sampling," International Research Journal of Engineering and Technology, vol. 4, no. 8, pp. 317-320, Aug. 2017.
6 Hyun Kwon, Seung-ho BangandKi-woong Park, "A design of deep neural network-based network intrusion detection system," Journal of KING Computing, 16(1), pp. 7-18, Feb. 2020.
7 W. Haider, J. Hu, J. Slay, B.P. Turnbull and Y. Xie, "Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling," Journal of Network and Computer Applications, vol. 87, no. 1, pp. 185-192, June 2017.   DOI
8 A. Ng, "Sizeof dev and test sets(C3W1L06)," 2017. https://github.com/hithesh111/Hith100/blob/master/100Days/day035.ipynb
9 Jae-hyun Seo, "A comparative study on the classification of the imbalanced intrusion detection dataset based on deep learning," Journal of Korean Institute of Intelligent System, 28(2), pp. 152-159, April 2018.   DOI
10 C. Yin, Y. Zhu, J. Fei and X. He, "A deep learning approach for intrusion detection using recurrent neural networks," IEEE Access, vol. 5, pp. 21954-21961, Nov. 2017.   DOI
11 M. Ramaiah, V. Chandrasekaran, V. Ravi and N. Kumar, "An intrusion detection system using optimized deep neural network architecture," Transactions on Emerging Telecommunications Technologies, vol. 32, no. 4, pp. 1-17, Feb. 2021.
12 R. Corizzo, E. Zdravevski, M. Russell, A. Vagliano and N. Japkowicz, "Feature extraction based on word embedding models for intrusion detection in network traffic," Journal of Surveillance, Security and Safety, vol. 1, pp. 140-150, Dec. 2020.
13 A. M. Dai, C. Olah, and Q. V. Le, "Document embedding with paragraph vectors," arXiv:1507.07998, 2015.
14 H. Akoglu, "User's guide to correlation coefficients," Turkish Journal of Emergency Medicine, vol. 18, no. 3, pp. 91-93, Aug. 2018.   DOI
15 R. A. Maxion and R. R. Roberts,"Proper Use of ROC Curves in Intrusion / Anomaly Detection," University of Newcastle upon Tyne, Computing Science Tyne, UK, p. 33, 2004.
16 N. Quang-Hung, H. Doan and N.Thoai, "Performance evaluation of distributed training in tensorflow 2," International Conference on Advanced Computing and Applications, pp.155-159, Nov. 2020.