Browse > Article
http://dx.doi.org/10.6109/jkiice.2022.26.12.1872

A Data Sampling Technique for Secure Dataset Using Weight VAE Oversampling(W-VAE)  

Kang, Hanbada (Department of Convergence Security, Chung-Ang University)
Lee, Jaewoo (Department of Industrial Security, Chung-Ang University)
Abstract
Recently, with the development of artificial intelligence technology, research to use artificial intelligence to detect hacking attacks is being actively conducted. However, the fact that security data is a representative imbalanced data is recognized as a major obstacle in composing the learning data, which is the key to the development of artificial intelligence models. Therefore, in this paper, we propose a W-VAE oversampling technique that applies VAE, a deep learning generation model, to data extraction for oversampling, and sets the number of oversampling for each class through weight calculation using K-NN for sampling. In this paper, a total of five oversampling techniques such as ROS, SMOTE, and ADASYN were applied through NSL-KDD, an open network security dataset. The oversampling method proposed in this paper proved to be the most effective sampling method compared to the existing oversampling method through the F1-Score evaluation index.
Keywords
AI; Information Security; Over Sampling; VAE;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 K. Sohn, H. Lee, and X. Yan, "Learning Structured Output Representation using Deep Conditional Generative Models," in Proceedings of Advances in neural information processing systems (NeurIPS), Montreal: QC, Canada, pp. 3483-3491, 2015.
2 F. Ulger, S. E. Yuksel, and A. Yilmaz, "Anomaly Detection for Solder Joints Using β-VAE," IEEE Transactions on Components, Packaging and Manufacturing Technology, vol. 11, no. 12, pp. 2214-2221, Oct. 2021.   DOI
3 S. C. Hsiao, D. Y. Kao, Z. Y. Liu, and R. Tso, "Malware Image Classification Using One-Shot Learning with Siamese Networks," in Procedia Computer Science, Budapest, Hungary, vol. 159, pp. 1863-1871, 2019.   DOI
4 University of new brunswick, NSK-KDD dataset [Online]. Available: https://www.unb.ca/cic/datasets/nsl.html.
5 C. Yin, Y. Zhu, J. Fei and X. He, "A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks," IEEE Access, vol. 5, pp. 21954-21961, Oct. 2017.   DOI
6 K. J. Ryu, "Study for Solving Network Traffic Data Imbalance And Rare Class Problems Using a Similarity Neural Network," M. S. thesis, Sejong University, Korea, 2021.
7 J. H. Yang, "Comparison of the Classification Algorithms Using a Sampling Technique in Imbalanced Data," M. S. thesis, Dongguk University, Korea, 2017.
8 H. Tingfei, C. Guangquan, and H. Kuihua, "Using Variational Auto Encoding in Credit Card Fraud Detection," IEEE Access, vol. 8, pp. 149841-149853, Aug. 2020.   DOI
9 Y. H. Choe and K. W. Oh, "A Study on the Introduction of CTGAN Oversampling Algorithm to improve Imbalance Problem in Intrusion Detection Data," The Journal of Korean Institute of Communications and Information Sciences, vol. 45, no. 12, pp. 2114-2122, Dec. 2020.   DOI
10 J. H. Park, "Improving Fashion Style Classification Accuracy using VAE in Class Imbalance Problem," The Journal of Korean Institute of Information Technology, vol. 19, no. 2, pp. 1-10, Feb. 2021.
11 P. Devan and N. Khare, "An efficient XGBoost-DNN-based classification model for network intrusion detection system," Neural Computing and Applications, vol. 32, pp. 12499-12514, Jan. 2020.   DOI
12 D. P. Kingma and M. Welling, "Auto-Encoding Variational Bayes," arXiv:1312.6114v10, 2013.
13 S. H Seo, Y. J. Jeon, J. S. Lee, H. J. Jung, and J. T. Kim, "An Over-sampling Method based on Generative Adversarial Networks for Effective Classification of Imbalanced Big Data," in Proceedings of Korea Software Congress 2017, Busan, Korea, pp. 1030-1032, 2017.
14 M. J. Son, S. W. Jung, and E. J. Hwang, "A Deep Learning Based Over-Sampling Scheme for Imbalanced Data Classification," KIPS Transactions on Software and Data Engineering, vol. 8, no. 7, pp. 311-136, Jul. 2019.   DOI
15 I. O. Jung, J. W. Ji, G. H. Lee, and M. J. Kim, "A study on intrusion detection performance improvement through imbalanced data processing," Jouranl of Information and Security, vol. 21, no. 3, pp. 57-66, Sep. 2021.
16 N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, Jun. 2002.   DOI
17 H. He, Y. Bai, E. A. Garcia, and S. Li, "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning," in Proceedings of IEEE International Joint Conference on Neural Networks, Hong Kong, pp.1322-1328, 2008.
18 K. Lee, "Oversampling based on Gaussian Mixture Model for Imbalanced data classification," M. S. thesis, Hanyang University, Korea, 2019.
19 S. T. Yoo and K. S. Kim., "Comparison of Anomaly Detection Performance Based on GRU Model Applying Various Data Preprocessing Techniques and Data Oversampling," Journal of the Korea Institute of Information Security & Cryptology, vol. 32, no. 2, pp. 201-211, Apr. 2022.