Browse > Article
http://dx.doi.org/10.3745/KTSDE.2021.10.11.449

Comparative Study of Anomaly Detection Accuracy of Intrusion Detection Systems Based on Various Data Preprocessing Techniques  

Park, Kyungseon (아주대학교 지식정보공학과)
Kim, Kangseok (아주대학교 사이버보안학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.10, no.11, 2021 , pp. 449-456 More about this Journal
Abstract
An intrusion detection system is a technology that detects abnormal behaviors that violate security, and detects abnormal operations and prevents system attacks. Existing intrusion detection systems have been designed using statistical analysis or anomaly detection techniques for traffic patterns, but modern systems generate a variety of traffic different from existing systems due to rapidly growing technologies, so the existing methods have limitations. In order to overcome this limitation, study on intrusion detection methods applying various machine learning techniques is being actively conducted. In this study, a comparative study was conducted on data preprocessing techniques that can improve the accuracy of anomaly detection using NGIDS-DS (Next Generation IDS Database) generated by simulation equipment for traffic in various network environments. Padding and sliding window were used as data preprocessing, and an oversampling technique with Adversarial Auto-Encoder (AAE) was applied to solve the problem of imbalance between the normal data rate and the abnormal data rate. In addition, the performance improvement of detection accuracy was confirmed by using Skip-gram among the Word2Vec techniques that can extract feature vectors of preprocessed sequence data. PCA-SVM and GRU were used as models for comparative experiments, and the experimental results showed better performance when sliding window, skip-gram, AAE, and GRU were applied.
Keywords
Intrusion Detection; Sliding Window; Skip-gram; AAE; GRU;
Citations & Related Records
연도 인용수 순위
  • Reference
1 T. Mikolov, G. Corrado, K. Chen, and J. Dean, "Efficient estimation of word representations in vector space," International Conference on Learning Representations, AZ, USA, pp.1-12, 2013. http://arxiv.org/abs/1301.3781
2 Y. Cheong, K. Park, H. Kim, J. Kim, and S. Hyun, "Machine learning based intrusion detection systems for class imbalanced datasets," Journal of the Korea Institute of Information Security and Cryptology, Vol.27, No.6, pp.1385-1395, 2017. https://doi.org/10.13089/JKIISC.2017.27.6.1385   DOI
3 W. Haider, J. Hua, J. Slaya, B. P. Turnbull, and Y. Xieb, "Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling," Journal of Network and Computer Applications, Vol.87, No.1, pp.185-192, 2017. https://doi.org/10.1016/j.jnca.2017.03.018   DOI
4 S. Kim and S. Park, "Multi-class classification of database workloads using PCA-SVM classifier," Journal of KIISE: Database, Vol.38, No.1, pp.1-8, 2011.
5 M. Lee, "LSTM model based on session management for network intrusion detection," Journal of The Institute of Internet, Broadcasting and Communication, Vol.20, No.3, pp.1-7, 2020. https://doi.org/10.7236/JIIBC.2020.20.3.1   DOI
6 B. Min, J. Ryu, D. Shin, and D. Shin, "Improved network intrusion detection model through hybrid feature selection and data balancing," KIPS Transactions on Software and Data Engineering, Vol.10, No.2, pp.65-72, 2021. https://doi.org/10.3745/KTSDE.2021.10.2.65   DOI
7 D. Senthil and G. Suseendran, "Efficient time series data classification using sliding window technique based improved association rule mining with enhanced support vector machine," International Journal of Engineering and Technology(UAE), Vol.7, No.2, 2018. https://doi.org/10.14419/ijet.v7i2.33.13890   DOI
8 M. A. Turk and A. P. Pentland, "Face recognition using eigenfaces," Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Maui, HI, USA, pp.586-591, 1991. https://doi.org/10.1109/CVPR.1991.139758   DOI
9 Y. Lee, "Design and analysis of multiple intrusion detection model," Journal of The Korea Institute of Electronic Communication Sciences, Vol.11, No.6, pp.619-626, 2016.   DOI
10 G. Nicole and J. Alfred, "Are GRU cells more specific and LSTM cells more sensitive in motive classification of text?," Frontiers in Artificial Intelligence, Vol.3, 2020. https://doi.org/10.3389/frai.2020.00040   DOI
11 N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of Artificial Inteligence Research(JAIR), Vol.16, No.1, pp.321-357, 2002.   DOI
12 A. Makhzani, J. Shlens, N. Jaitly, L. Goodfellow, and B. Frey, "Adversarial autoencoders," International Conference on Learning Representations, San Juan, Puerto Rico, 2016, http://arxiv.org/abs/1511.05644
13 K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdabau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing EMNLP, Doha, Qatar, pp.1724-1734, 2014.
14 M. Shahriar and N. Haque, "G-IDS: Generative adversarial networks assisted intrusion detection system," IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), pp.376-385, 2020. https://doi.org/10.1109/COMPSAC48688.2020.0-218   DOI
15 R. Corizzo, E. Zdravevski, M. Russell, A. Vagliano, and N. Japkowicz, "Feature extraction based on word embedding models for intrusion detection in network traffic," Journal of Surveillance, Security and Safety, Vol.1, pp.140-150, 2020. https://doi.org/10.20517/jsss.2020.15   DOI
16 J. Lee and K. Park, "GAN-based imbalanced data intrusion detection system," Personal and Ubiquitous Computing, Vol. 25, pp.121-128, 2021. https://doi.org/10.1007/s00779-019-01332-y   DOI
17 S. Jo, H. Sung, and B. Ahn, "A comparative study on the performance of SVM and an artificial neural network in intrusion detection," Journal of Korea Academia-Industrial Cooperation Society, Vol.17, No.2, pp.703-711, 2016. https://doi.org/10.5762/KAIS.2016.17.2.703   DOI
18 D. M. Reddy and N. V. S. Reddy, "Effects of padding on LSTMs and CNNs," arXiv:1903.07288v1, 2019. https://arxiv.org/pdf/1903.07288.pdf
19 C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, Vol.20, No.3, pp.273-297, 1995. https://dx.doi.org/10.1007%2FBF00994018   DOI