DOI QR코드

DOI QR Code

The Enhancement of intrusion detection reliability using Explainable Artificial Intelligence(XAI)

설명 가능한 인공지능(XAI)을 활용한 침입탐지 신뢰성 강화 방안

  • 정일옥 (고려대학교/정보보호학과) ;
  • 최우빈 (경희대학교/응용수학과) ;
  • 김수철 (숭실대학교/IT정책경영학과)
  • Received : 2022.08.31
  • Accepted : 2022.09.28
  • Published : 2022.09.30

Abstract

As the cases of using artificial intelligence in various fields increase, attempts to solve various issues through artificial intelligence in the intrusion detection field are also increasing. However, the black box basis, which cannot explain or trace the reasons for the predicted results through machine learning, presents difficulties for security professionals who must use it. To solve this problem, research on explainable AI(XAI), which helps interpret and understand decisions in machine learning, is increasing in various fields. Therefore, in this paper, we propose an explanatory AI to enhance the reliability of machine learning-based intrusion detection prediction results. First, the intrusion detection model is implemented through XGBoost, and the description of the model is implemented using SHAP. And it provides reliability for security experts to make decisions by comparing and analyzing the existing feature importance and the results using SHAP. For this experiment, PKDD2007 dataset was used, and the association between existing feature importance and SHAP Value was analyzed, and it was verified that SHAP-based explainable AI was valid to give security experts the reliability of the prediction results of intrusion detection models.

다양한 분야에서 인공지능을 활용한 사례가 증가하면서 침입탐지 분야 또한 다양한 이슈를 인공지능을 통해 해결하려는 시도가 증가하고 있다. 하지만, 머신러닝을 통한 예측된 결과에 관한 이유를 설명하거나 추적할 수 없는 블랙박스 기반이 대부분으로 이를 활용해야 하는 보안 전문가에게 어려움을 주고 있다. 이러한 문제를 해결하고자 다양한 분야에서 머신러닝의 결정을 해석하고 이해하는데 도움이 되는 설명 가능한 AI(XAI)에 대한 연구가 증가하고 있다. 이에 본 논문에서는 머신러닝 기반의 침입탐지 예측 결과에 대한 신뢰성을 강화하기 위한 설명 가능한 AI를 제안한다. 먼저, XGBoost를 통해 침입탐지 모델을 구현하고, SHAP을 활용하여 모델에 대한 설명을 구현한다. 그리고 기존의 피처 중요도와 SHAP을 활용한 결과를 비교 분석하여 보안 전문가가 결정을 수행하는데 신뢰성을 제공한다. 본 실험을 위해 PKDD2007 데이터셋을 사용하였으며 기존의 피처 중요도와 SHAP Value에 대한 연관성을 분석하였으며, 이를 통해 SHAP 기반의 설명 가능한 AI가 보안 전문가들에게 침입탐지 모델의 예측 결과에 대한 신뢰성을 주는데 타당함을 검증하였다.

Keywords

Acknowledgement

본 논문은 2022년 정부(국토교통부)의 재원으로 국토교통과학기술진흥원(KAIA)의 지원을 받아 연구가 수행된 연구임(22TLRP-B152767-04, 자율협력주행 도로교통체계 통합보안시스템 운영을 위한 기술 및 제도개발)

References

  1. "2021 국가정보보호백서", KISA, 2021.5
  2. Capgemini Research Institute(2019), 'Reinventing Cybersecurity with Artificial Intelligence: The new frontier in digital security', 11 July 2019.
  3. Cisco(2019), 2019 연례 사이버보안 보고서.
  4. Barnard, Pieter & Marchetti, Nicola & Silva, Luiz. (2022). Robust Network Intrusion Detection through Explainable Artificial Intelligence (XAI). IEEE Networking Letters. 1-1. 10.1109/LNET. 2022.3186589.
  5. D. Gunning and D. Aha, "DARPA's Explainable Artificial Intelligence (XAI) Program", AIMag, vol. 40, no. 2, pp. 44-58, Jun. 2019. https://doi.org/10.1609/aimag.v40i2.2850
  6. Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., & Chen, K. (2015). Xgboost: extreme gradient boosting. R package version 0.4-2, 1(4), 1-4.
  7. D. Fryer, I. Strumke and H. Nguyen, "Shapley Values for Feature Selection: The Good, the Bad, and the Axioms," in IEEE Access, vol. 9, pp. 144352-144360, 2021, doi: 10.1109/ACCESS.2021.3119110.
  8. Ke, Guolin, et al. "Lightgbm: A highly efficient gradient boosting decision tree." Advances in neural information processing systems 30 (2017).
  9. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, 31.
  10. Tjoa, E., & Guan, C. (2020). A survey on explainable artificial intelligence (xai): Toward medical xai. IEEE transactions on neural networks and learning systems, 32(11), 4793-4813. https://doi.org/10.1109/TNNLS.2020.3027314
  11. Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  12. Fisher, A., Rudin, C., & Dominici, F. (2019). All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously. J. Mach. Learn. Res., 20(177), 1-81.
  13. Greenwell, B. M. (2017). pdp: An R package for constructing partial dependence plots. RJ., 9(1), 421.
  14. Srivastava, Gautam & Jhaveri, Rutvij & Bhattacharya, Sweta & Pandya, Sharnil & Rajeswari, & Reddy, Praveen & Yenduri, Gokul & Hall, Jon & Alazab, Mamoun & Gadekallu, Thippa. (2022). XAI for Cybersecurity: State of the Art, Challenges, Open Issues and Future Directions.
  15. S. Mane and D. Rao, "Explaining network intrusion detection system using explainable AI framework," 2021, arXiv:2103.07110.
  16. M. Wang, K. Zheng, Y. Yang, and X. Wang, "An explainable machine learning framework for intrusion detection systems," IEEE Access, vol. 8, pp. 73127-73141, 2020. https://doi.org/10.1109/ACCESS.2020.2988359
  17. S. Wali and I. Khan. "Explainable AI and Random Forest Based Reliable Intrusion Detection System." Dec. 2021. [Online]. Available: https://www.techrxiv.org/articles/preprint/Explainable_AI_and_Random_Forest_Based_Reliable_Intrusion_Detection_system/17169080.
  18. Dhanabal, L., & Shantharajah, S. P. (2015). A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. International journal of advanced research in computer and communication engineering, 4(6), 446-452.
  19. Gopalan, S. S., Ravikumar, D., Linekar, D., Raza, A., & Hasib, M. (2021, March). Balancing approaches towards ML for IDS: a survey for the CSE-CIC IDS dataset. In 2020 International Conference on Communications, Signal Processing, and their Applications (ICCSPA) (pp. 1-6). IEEE.
  20. Koronacki, J. N. K. J., Matwin, R. L. D. M. S., & Skowron, D. M. A. Knowledge Discovery in Databases: PKDD 2007.
  21. Visa, S., Ramsay, B., Ralescu, A. L., & Van Der Knaap, E. (2011). Confusion matrix-based feature selection. MAICS, 710(1), 120-127.