Browse > Article
http://dx.doi.org/10.33778/kcsa.2021.21.3.057

A study on intrusion detection performance improvement through imbalanced data processing  

Jung, Il Ok (고려대학교/정보보호학과)
Ji, Jae-Won (이글루시큐리티)
Lee, Gyu-Hwan (이글루시큐리티)
Kim, Myo-Jeong (이글루시큐리티)
Publication Information
Abstract
As the detection performance using deep learning and machine learning of the intrusion detection field has been verified, the cases of using it are increasing day by day. However, it is difficult to collect the data required for learning, and it is difficult to apply the machine learning performance to reality due to the imbalance of the collected data. Therefore, in this paper, A mixed sampling technique using t-SNE visualization for imbalanced data processing is proposed as a solution to this problem. To do this, separate fields according to characteristics for intrusion detection events, including payload. Extracts TF-IDF-based features for separated fields. After applying the mixed sampling technique based on the extracted features, a data set optimized for intrusion detection with imbalanced data is obtained through data visualization using t-SNE. Nine sampling techniques were applied through the open intrusion detection dataset CSIC2012, and it was verified that the proposed sampling technique improves detection performance through F-score and G-mean evaluation indicators.
Keywords
imbalanced data; intrusion detection; machine learning; sampling;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Son, M.J.; Jung, S.W.; Hwang, E.J. 불균형 데이터 분류를 위한 딥러닝 기반 오버샘플링 기법. 정보처리학회논문지:소프트웨어 및 데이터공학 2019, 8, 311-316, doi:10.3745/KTSDE.2019.8.7.311.   DOI
2 M. Kubat and S. Matwin, "Addressing the curse of imbalanced training sets: one-sided selection," in Proceedings of the International Conference on Machine Learning, pp. 179-186, Nashville, Tenn, USA, 1997.
3 Y. Liu, X. H. Yu, J. X. Huang, and A. J. An, "Combining integrated sampling with SVM ensembles for learning from imbalanced datasets," Information Processing & Management, vol. 47, no. 4, pp. 617-631, 2011.   DOI
4 Web Attacks Detection based on CNN - Csic torpedo 2012 http data sets - GitHub, July 20, 2021. [Online]. Available: https://github.com/DuckDuckBug/cnn_waf.
5 ZHU M, XIA J, JIN X Q, et al. Class weights random forest algorithm for processing class imbalanced medical data. IEEE Access, 2018, 6: 4641-4652.   DOI
6 Leea, H.J.; Lee, S. 데이터 전처리와 앙상블 기법을 통한 불균형 데이터의 분류모형 비교 연구. 응용통계연구 2014, 27, 357-371, doi:10.5351/KJAS.2014.27.3.357.   DOI
7 FREUND Y. Experiment with a new boosting algorithm. Proc. of the 13th International Conference on Machine Learning, 1996: 148 - 156.
8 TAN X P, SU S J, HUANG Z P, et al. Wireless sensor networks intrusion detection based on SMOTE and the random forest algorithm. Sensors, 2019, 19(1): 203.   DOI
9 LI C L, LIU S G. A comparative study of the class imbalance problem in Twitter spam detection. Concurrency and Computation: Practice and Experience, 2017, 30(5): e4281.   DOI
10 LI Y L, SUN G S, ZHU Y H. Data imbalance problem in text classification. Proc. of the 3rd International Symposium on Information Processing, 2010: 301 - 305.
11 CHAWLA N V, LAZAREVIC A, HALL L O, et al. SMOTE- Boost: improving prediction of the minority class in boosting. Proc. of the 7th European Conference on Principles and Prac- tice of Knowledge Discovery in Databases, 2003: 107 - 119.
12 Yan, B.; Han, G.; Sun, M.; Ye, S. A Novel Region Adaptive SMOTE Algorithm for Intrusion Detection on Imbalanced Problem. In Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications (ICCC); IEEE: Chengdu, December 2017; pp. 1281-1286.
13 H. Haibo , A. Garcia, E.: "Learning from Imbalanced Data", IEEE Transactions On Knowledge And Data Engineering, Vol.2, No.9, September (2009).
14 HAN H, WANG W Y, MAO B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Proc. of the International Conference on Advances in Intelligent Computing, 2005: 878 - 887.
15 Yong Sun; Feng Liu SMOTE-NCL: A Re-Sampling Method with Filter for Network Intrusion Detection. In Proceedings of the 2016 2nd IEEE International Conference on Computer and Communications (ICCC); IEEE: Chengdu, China, October 2016; pp. 1157-1161.
16 정일옥, 전이학습과 불균형 데이터 처리를 통한 침입탐지 성능향상에 관한 연구, 박사학위논문, 고려대학교 2021. 8
17 Carmen Torrano-Gimenez, Alejandro Perez-Villegas, and Gonzalo Alvarez. "TORPEDA: Una Especificacion Abierta de Conjuntos de Datos para la Evaluacion de Cortafuegos de Aplicaciones Web." 2012. TIN2011-29709-C0201.
18 Kim, D.; Kang, S.; Song, J. 불균형 자료에 대한 분류분석. 응용통계연구 2015, 28, 495-509, doi:10.5351/KJAS.2015.28.3.495.   DOI
19 Csic torpeda 2012, http data sets, July 20, 2021. [Online]. Available: http://www.tic.itefi.csic.es/torpeda.