DOI QR코드

DOI QR Code

Improved Network Intrusion Detection Model through Hybrid Feature Selection and Data Balancing

Hybrid Feature Selection과 Data Balancing을 통한 효율적인 네트워크 침입 탐지 모델

  • Received : 2020.07.08
  • Accepted : 2020.08.09
  • Published : 2021.02.28

Abstract

Recently, attacks on the network environment have been rapidly escalating and intelligent. Thus, the signature-based network intrusion detection system is becoming clear about its limitations. To solve these problems, research on machine learning-based intrusion detection systems is being conducted in many ways, but two problems are encountered to use machine learning for intrusion detection. The first is to find important features associated with learning for real-time detection, and the second is the imbalance of data used in learning. This problem is fatal because the performance of machine learning algorithms is data-dependent. In this paper, we propose the HSF-DNN, a network intrusion detection model based on a deep neural network to solve the problems presented above. The proposed HFS-DNN was learned through the NSL-KDD data set and performs performance comparisons with existing classification models. Experiments have confirmed that the proposed Hybrid Feature Selection algorithm does not degrade performance, and in an experiment between learning models that solved the imbalance problem, the model proposed in this paper showed the best performance.

최근 네트워크 환경에 대한 공격이 급속도로 고도화 및 지능화 되고 있기에, 기존의 시그니처 기반 침입탐지 시스템은 한계점이 명확해지고 있다. 이러한 문제를 해결하기 위해서 기계학습 기반의 침입 탐지 시스템에 대한 연구가 활발히 진행되고 있다. 하지만 기계학습을 침입 탐지에 이용하기 위해서는 두 가지 문제에 직면한다. 첫 번째는 실시간 탐지를 위한 학습과 연관된 중요 특징들을 선별하는 문제이며, 두 번째는 학습에 사용되는 데이터의 불균형 문제로, 기계학습 알고리즘들은 데이터에 의존적이기에 이러한 문제는 치명적이다. 본 논문에서는 위 제시된 문제들을 해결하기 위해서 Hybrid Feature Selection과 Data Balancing을 통한 심층 신경망 기반의 네트워크 침입 탐지 모델인 HFS-DNN을 제안한다. NSL-KDD 데이터 셋을 통해 학습을 진행하였으며, 기존 분류 모델들과 성능 비교를 수행한다. 본 연구에서 제안된 Hybrid Feature Selection 알고리즘이 학습 모델의 성능을 왜곡 시키지 않는 것을 확인하였으며, 불균형을 해소한 학습 모델들간 실험에서 본 논문에서 제안한 학습 모델이 가장 좋은 성능을 보였다.

Keywords

References

  1. S. H. Kang, I. S. Jeong, and H. S. Lim, "A feature set selection approach based on pearson correlation coefficient for real time attack detection," Convergence Security Journal, Vol.18, No.5_1, pp.59-66, 2018.
  2. H. S. Chae, B. O. Jo, S. H. Choi, and T. K. Park, "Feature selection for intrusion detection using NSL-KDD," Recent Advances in Computer Science, pp.184-187, 2013.
  3. N. F. Haq, A. R. Onik, and F. M. Shah, "An ensemble framework of anomaly detection using hybridized feature selection approach (HFSA)," In 2015 SAI Intelligent Systems Conference, pp.989-995, 2015.
  4. R, Longadge and S, Dongre, "Class imbalance problem in data mining review," arXiv preprint arXiv:1305.1707, 2013.
  5. T. H. Kim, S. H. Kang, "An Intrusion Detection System based on the Artificial Neural Network for Real Time Detection," Convergence Security Journal, Vol.17, No.1, pp.31-38, 2017.
  6. J. Song, H. Takakura, Y. Okabe, and Y. Kwon, "Correlation analysis between honeypot data and IDS alerts using one-class SVM," Intrusion Detection Systems, pp.173-192, 2011.
  7. A. Tesfahun and D. L. Bhaskari, "Intrusion detection using random forests classifier with SMOTE and feature reduction," International Conference on Cloud & Ubiquitous Computing & Emerging Technologies, pp.127-132, 2013.
  8. I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," Journal of Machine Learning Research, Vol.3, pp.1157-1182, 2003.
  9. H. He, Y. Bai, E. A. Garcia, and S. Li, "ADASYN: Adaptive synthetic sampling approach for imbalanced learning," IEEE international joint conference on neural networks, pp.1322-1328, 2008.
  10. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, Vol.16, pp.321-357, 2002. https://doi.org/10.1613/jair.953
  11. Y. Yang, K. Zheng, C. Wu, and Y. Yang, "Improving the classification effectiveness of intrusion detection by using improved conditional variational autoencoder and deep neural network," Sensors, Vol.19, No.11 pp.2528. 2019. https://doi.org/10.3390/s19112528
  12. M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, "A detailed analysis of the KDD CUP 99 data set," IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp.1-6, 2009.
  13. H. Jiang, J. Nagra, and P. Ahammad, "Sok: Applying machine learning in security-a survey," arXiv preprint arXiv:1611.03186, 2016.
  14. S. Barua, M. M. Islam, X. Yao, and K. Murase, "MWMOTE--majority weighted minority oversampling technique for imbalanced data set learning," IEEE Transactions on Knowledge and Data Engineering, Vol.26, No.2, pp.405-425, 2012. https://doi.org/10.1109/TKDE.2012.232
  15. K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," Proceedings of the IEEE International Conference on Computer Vision, pp.1026-1034, 2015.