DOI QR코드

DOI QR Code

마이터 어택과 머신러닝을 이용한 UNSW-NB15 데이터셋 기반 유해 트래픽 분류

Malicious Traffic Classification Using Mitre ATT&CK and Machine Learning Based on UNSW-NB15 Dataset

  • 윤동현 (성균관대학교 정보보호학과) ;
  • 구자환 (성균관대학교 소프트웨어융합대학) ;
  • 원동호 (성균관대학교 소프트웨어융합대학)
  • 투고 : 2022.10.05
  • 심사 : 2022.12.14
  • 발행 : 2023.02.28

초록

본 연구는 현 보안 관제 시스템이 직면한 실시간 트래픽 탐지 문제를 해결하기 위해 사이버 위협 프레임워크인 마이터 어택과 머신러닝을 이용하여 유해 네트워크 트래픽을 분류하는 방안을 제안하였다. 마이터 어택 프레임워크에 네트워크 트래픽 데이터셋인 UNSW-NB15를 적용하여 라벨을 변환 후 희소 클래스 처리를 통해 최종 데이터셋을 생성하였다. 생성된 최종 데이터셋을 사용하여 부스팅 기반의 앙상블 모델을 학습시킨 후 이러한 앙상블 모델들이 다양한 성능 측정 지표로 어떻게 네트워크 트래픽을 분류하는지 평가하였다. 그 결과 F-1 스코어를 기준으로 평가하였을 때 희소 클래스 미처리한 XGBoost가 멀티 클래스 트래픽 환경에서 가장 우수함을 보였다. 학습하기 어려운 소수의 공격클래스까지 포함하여 마이터 어택라벨 변환 및 오버샘플링처리를 통한 머신러닝은 기존 연구 대비 차별점을 가지고 있으나, 기존 데이터셋과 마이터 어택 라벨 간의 변환 시 완벽하게 일치할 수 없는 점과 지나친 희소 클래스 존재로 인한 한계가 있음을 인지하였다. 그럼에도 불구하고 B-SMOTE를 적용한 Catboost는 0.9526의 분류 정확도를 달성하였고 이는 정상/비정상 네트워크 트래픽을 자동으로 탐지할 수 있을 것으로 보인다.

This study proposed a classification of malicious network traffic using the cyber threat framework(Mitre ATT&CK) and machine learning to solve the real-time traffic detection problems faced by current security monitoring systems. We applied a network traffic dataset called UNSW-NB15 to the Mitre ATT&CK framework to transform the label and generate the final dataset through rare class processing. After learning several boosting-based ensemble models using the generated final dataset, we demonstrated how these ensemble models classify network traffic using various performance metrics. Based on the F-1 score, we showed that XGBoost with no rare class processing is the best in the multi-class traffic environment. We recognized that machine learning ensemble models through Mitre ATT&CK label conversion and oversampling processing have differences over existing studies, but have limitations due to (1) the inability to match perfectly when converting between existing datasets and Mitre ATT&CK labels and (2) the presence of excessive sparse classes. Nevertheless, Catboost with B-SMOTE achieved the classification accuracy of 0.9526, which is expected to be able to automatically detect normal/abnormal network traffic.

키워드

참고문헌

  1. Cisco, Visual Networking Index: Global Mobile Data Traffic Forecast Update 2017-2022 [Internet], https://www.cisco.com/c/dam/global/ko_kr/solutions/service-provider/visual-networking-index-vni/.
  2. The institute of Foreign Affairs and National Security(ROK Gov), Cyberwarfare in the Russo-Ukraine War: Evaluation and Implications [Internet], https://www.ifans.go.kr/.
  3. National Intelligence service Republic of Korea, Operational Rules for Cybersecurity (Goverment Rule) [Internet], https://www.law.go.kr/.
  4. H. Hwang, D. Moon, and I. Kim, "Trend and issue dynamic analysis for malware," in Proceedings of the Korea Information Processing Society Conference, The KIPS, pp.418-420, 2015.
  5. J. W. Ji, "Problems of cyber security control system and the application of machine learning technology," Review of Korea Institue of Information Security and Cryptology, Vol.31, No.3, pp.13-19, 2021.
  6. K. H. Kim, K. D. Park, and M. N Sim, "A study on the organizational conflict and job withdrawal intention of the information security workers," Journal of the Korea Institute of Information Security & Cryptology, Vol.29, No.2, pp.451-463, 2019.
  7. Mitre, Mitre ATT&CK [Internet], https://attack.mitre.org.
  8. T. Chen and C. Guestrin. "Xgboost: A scalable tree boosting system," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM SIGKDD, 2016, pp.785-794.
  9. G. Ke et al., "Lightgbm: A highly efficient gradient boosting decision tree," in Advances in Neural Information Processing Systems, 30, NIPS, 2017.
  10. A. V. Dorogush, V. Ershov, and A. Gulin, "CatBoost: gradient boosting with categorical features support." arXiv preprint arXiv:1810.11363.
  11. M. Nour and J. Slay, "UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)," in 2015 Military Communications and Information Systems Conference (MilCIS), IEEE, 2015, pp.1-6.
  12. I. H. Seok, K. T. Lee, J. H. Yu, and S. J. Kim, "CNN based Real-Time DNS DDoS attack detection system," The KIPS Transactions on Computer and Communication Systems, Vol.6, No.3, pp.135-142, 2017. https://doi.org/10.3745/KTCCS.2017.6.3.135
  13. M. Hammad, W. El-medany, and Y. Ismail, "Intrusion detection system using feature selection with clustering and classification machine learning algorithms on the unswnb15 dataset," in 2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT), IEEE, pp.1-6, 2020.
  14. S. S. Kim, L. Chen, and J. Y. Kim, "Intrusion prediction using long short-term memory deep learning with UNSWNB15," in 2021 IEEE/ACIS 6th International Conference on Big Data, Cloud Computing, and Data Science (BCD), IEEE, 2021.
  15. P. Singh, J. Jaykumar, A. Pankaj, and R. Mitra "Edge-detect: Edge-centric network intrusion detection using deep neural network," in 2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC), IEEE, 2021.
  16. Y. H. Lee, S. H. Baek, J. W. Seo, I. Y. Bang, and Y. H. Paek, "A study of cyber attacks and recent defense system: DDoS detection and applying deep learning," in Proceedings of the Korea Information Processing Society Conference, The KIPS, pp.302-305, 2017.
  17. P. D. Yoon and K. H. Hwang, "Malicious traffic detection using ensemble learning based on UNSW-NB15 dataset," in 2021 KICS Winter Conference, KICS, 2021.
  18. D. B. Lee and J. H. Seo, "Classification performance improvement of UNSW-NB15 dataset based on feature selection," Journal of the Korea Convergence Society, Vol. 10, No.5, pp.35-42, 2019. https://doi.org/10.15207/JKCS.2019.10.5.035
  19. R. Malhotra and S. Kamal, "An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data," Neurocomputing, Vol.343, pp.120-140, 2019. https://doi.org/10.1016/j.neucom.2018.04.090
  20. R. Mohammed, J. Rawashdeh, and M. Abdullah, "Machine learning with oversampling and undersampling techniques: Overview study and experimental results," in 2020 11th International Conference on Information and Communication Systems (ICICS), IEEE, 2020, pp.243-248.
  21. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, Vol.16, pp.321-357, 2002. https://doi.org/10.1613/jair.953
  22. H. He, Y. Bai, E. A. Garcia, and S. Li, "ADASYN: Adaptive synthetic sampling approach for imbalanced learning," in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), IEEE, pp.1322-1328, 2008.
  23. H. Han, W. Y Wang, and B. H. Mao, "Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning," in International Conference on Intelligent Computing. Springer, 2005.
  24. Lockheedmartine, Cyber Kill Chain [Internet], https://www.lockheedmartin.com/en-us/capabilities/cyber/cyber-ki-chain.html.
  25. Y. I. Yoon, J. H. Kim, J. Y. Lee, S. D. Yu, and S. J. Lee, "A research on cyber kill chain and TTP by APT attack case study," Jouranl of Information and Security, Vol.20, No.4, pp91-101, 2020. https://doi.org/10.33778/kcsa.2020.20.4.091
  26. J. Wang, B. Hu, X. Li, and Z. Yang, "GTC forest: An ensemble method for network structured data classification," in 2018 14th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), IEEE, pp.81-85, 2018.
  27. T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, "Optuna: A next-generation hyperparameter optimization framework," in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM SIGKDD, pp2623-2631, 2019.
  28. E. Al Daoud, "Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset," International Journal of Computer and Information Engineering, Vol.13, No.1, pp.6-10, 2019.
  29. J. H. Hong and B. Y. Lee, "Artificial intelligence-based security control construction and countermeasures," Journal of the Korea Contents Association, Vol.21, No.1, pp.531-540, 2021. https://doi.org/10.5392/JKCA.2021.21.01.531