DOI QR코드

DOI QR Code

Classification Performance Improvement of UNSW-NB15 Dataset Based on Feature Selection

특징선택 기법에 기반한 UNSW-NB15 데이터셋의 분류 성능 개선

  • 이대범 (목원대학교) ;
  • 서재현 (원광대학교 컴퓨터.소프트웨어공학과)
  • Received : 2019.04.05
  • Accepted : 2019.05.20
  • Published : 2019.05.28

Abstract

Recently, as the Internet and various wearable devices have appeared, Internet technology has contributed to obtaining more convenient information and doing business. However, as the internet is used in various parts, the attack surface points that are exposed to attacks are increasing, Attempts to invade networks aimed at taking unfair advantage, such as cyber terrorism, are also increasing. In this paper, we propose a feature selection method to improve the classification performance of the class to classify the abnormal behavior in the network traffic. The UNSW-NB15 dataset has a rare class imbalance problem with relatively few instances compared to other classes, and an undersampling method is used to eliminate it. We use the SVM, k-NN, and decision tree algorithms and extract a subset of combinations with superior detection accuracy and RMSE through training and verification. The subset has recall values of more than 98% through the wrapper based experiments and the DT_PSO showed the best performance.

최근 사물인터넷과 다양한 웨어러블 기기들이 등장하면서 인터넷 기술은 보다 편리하게 정보를 얻고 업무를 수행하는데 기여하고 있으나 인터넷이 다양한 부분에 이용되면서 공격에 노출되는 Attack Surface 지점이 증가하고 있으며 개인정보 획득, 위조, 사이버 테러 등 부당한 이익을 취하기 위한 목적의 네트워크 침입 시도 또한 증가하고 있다. 본 논문에서는 네트워크에서 발생하는 트래픽에서 비정상적인 행동을 분류하기 위한 희소클래스의 분류 성능을 개선하는 특징선택을 제안한다. UNSW-NB15 데이터셋은 다른 클래스에 비해 상대적으로 적은 인스턴스를 가지는 희소클래스 불균형 문제가 발생하며 이를 제거하기 위해 언더샘플링 방법을 사용한다. 학습 알고리즘으로 SVM, k-NN 및 decision tree를 사용하고 훈련과 검증을 통하여 탐지 정확도와 RMSE가 우수한 조합의 서브셋들을 추출한다. 서브셋들은 래퍼 기반의 실험을 통해 재현률 98%이상의 유효성을 입증하였으며 DT_PSO 방법이 가장 우수한 성능을 보였다.

Keywords

OHHGBW_2019_v10n5_35_f0001.png 이미지

Fig. 1. UNSW-NB15 Testbed

OHHGBW_2019_v10n5_35_f0002.png 이미지

Fig. 2. Formula of class imbalance ratio

OHHGBW_2019_v10n5_35_f0003.png 이미지

Fig. 3. Flowchart of data pre-processing and feature selection

OHHGBW_2019_v10n5_35_f0004.png 이미지

Fig. 4. Features selected by the wrapper-based experiments (A:SVM_GA, B:3NN_GA, C:DT_GA, D:DT_ANT, E:DT_PSO)

Table 9. Comparison of experimental results with other study

OHHGBW_2019_v10n5_35_f0005.png 이미지

OHHGBW_2019_v10n5_35_f0006.png 이미지

Fig. 8. Comparison of experimental results with other study (subset2, ROC curve)

OHHGBW_2019_v10n5_35_f0007.png 이미지

Fig. 5. Comparison of classification performance of rare classes by feature subset (Recall)

OHHGBW_2019_v10n5_35_f0008.png 이미지

Fig. 6. Comparison of classification performance of rare classes by feature subset (ROC curve)

OHHGBW_2019_v10n5_35_f0009.png 이미지

Fig. 7. Comparison of experimental results with other study (subset1, ROC curve)

Table 1. Comparisons of related works

OHHGBW_2019_v10n5_35_t0001.png 이미지

Table 2. Class imbalance ratio for each class

OHHGBW_2019_v10n5_35_t0002.png 이미지

Table 3. The number of instances according to the class imbalance ratio

OHHGBW_2019_v10n5_35_t0003.png 이미지

Table 4. Recalls according to the number of instances of Normal class

OHHGBW_2019_v10n5_35_t0004.png 이미지

Table 5. Recalls according to the number of instances of Generic class

OHHGBW_2019_v10n5_35_t0005.png 이미지

Table 6. Features used in the proposed method

OHHGBW_2019_v10n5_35_t0006.png 이미지

Table 7. Comparison of classification performance of rare classes by feature subset (Recall)

OHHGBW_2019_v10n5_35_t0007.png 이미지

Table 8. Comparison of classification performance of rare classes by feature subset (ROC curve)

OHHGBW_2019_v10n5_35_t0008.png 이미지

References

  1. T. Janarthanan & S. Zargari. (2017). Feature selection in UNSW-NB15 and KDDCUP'99 datasets. In Industrial Electronics (ISIE), IEEE 26th International Symposium on. (pp. 1881-1886). IEEE.
  2. C. Khammassi & S. Krichen. (2017). A GA-LR wrapper approach for feature selection in network intrusion detection. computers & security, 70, 255-277. https://doi.org/10.1016/j.cose.2017.06.005
  3. N. Moustafa & J. Slay. (2015). A hybrid feature selection for network intrusion detection systems: Central points. arXiv preprint arXiv:1707.05505.
  4. M. Kamarudin, C. Maple, T. Watson, & N. Safa. (2017). A logitboost-based algorithm for detecting known and unknown web attacks. IEEE Access, 5, 26190-26200. https://doi.org/10.1109/ACCESS.2017.2766844
  5. K. Mwitondi & S. Zargari. (2017). A Repeated Sampling and Clustering Method for Intrusion Detection. In International Conference in Data Mining (DMIN'17). (pp. 91-96). CSREA Press.
  6. M. Belouch, S. E. Hadai, & M. Idhammad. (2017). A two-stage classifier approach using reptree algorithm for network intrusion detection. International Journal of Advanced Computer Science and Applications (ijacsa), 8(6), 389-394.
  7. S. Guha. (2016). Attack detection for cyber systems and probabilistic state estimation in partially observable cyber environments. Arizona State University.
  8. N. Moustafa, G. Creech & J. Slay. (2017). Novel geometric area analysis technique for anomaly detection using trapezoidal area estimation on large-scale networks. IEEE Transactions on Big Data.
  9. M. Idhammad, K. Afdel, & M. Belouch. (2017). Dos detection method based on artificial neural networks. International Journal of Advanced Computer Science and Applications, 8(4), 465-471.
  10. The UNSW-NB15 dataset. (2018). www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets.
  11. CVE (Common Vulnerabilities and Exposures). (2018). cve.mitre.org.
  12. WEKA. (2018). www.cs.waikato.ac.nz/ml/weka.
  13. N. V. Chawla. (2009). Data mining for imbalanced datasets: An overview. In Data mining and knowledge discovery handbook. (pp. 875-886). Springer, Boston, MA.
  14. R. Kohavi & H. J. George. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1-2), 273-324. https://doi.org/10.1016/S0004-3702(97)00043-X
  15. J. rey Horn, N. Nafpliotis, & D. E. Goldberg. (1994). A niched Pareto genetic algorithm for multiobjective optimization. In Proceedings of the first IEEE conference on evolutionary computation, IEEE world congress on computational intelligence, (pp. 82-87).
  16. M. Dorigo, M. Birattari, C. Blum, M. Clerc, T. Stutzle, & A. Winfield. (2008). Ant Colony Optimization and Swarm Intelligence. The 6th International Conference, ANTS 2008, Springer.
  17. Y. Shi. (2001). Particle swarm optimization: developments, applications and resources. In evolutionary computation, 2001. Proceedings of the 2001 Congress on. (pp. 81-86). IEEE.