DOI QR코드

DOI QR Code

A Comparative Study of Machine Learning Algorithms Using LID-DS DataSet

LID-DS 데이터 세트를 사용한 기계학습 알고리즘 비교 연구

  • 박대경 (세종대학교 컴퓨터공학과 지능형드론 융합전공) ;
  • 류경준 (세종대학교 컴퓨터공학과) ;
  • 신동일 (세종대학교 컴퓨터공학과 지능형드론 융합전공) ;
  • 신동규 (세종대학교 컴퓨터공학과 지능형드론 융합전공) ;
  • 박정찬 (국방과학연구소) ;
  • 김진국 (국방과학연구소)
  • Received : 2020.07.22
  • Accepted : 2020.08.27
  • Published : 2021.03.31

Abstract

Today's information and communication technology is rapidly developing, the security of IT infrastructure is becoming more important, and at the same time, cyber attacks of various forms are becoming more advanced and sophisticated like intelligent persistent attacks (Advanced Persistent Threat). Early defense or prediction of increasingly sophisticated cyber attacks is extremely important, and in many cases, the analysis of network-based intrusion detection systems (NIDS) related data alone cannot prevent rapidly changing cyber attacks. Therefore, we are currently using data generated by intrusion detection systems to protect against cyber attacks described above through Host-based Intrusion Detection System (HIDS) data analysis. In this paper, we conducted a comparative study on machine learning algorithms using LID-DS (Leipzig Intrusion Detection-Data Set) host-based intrusion detection data including thread information, metadata, and buffer data missing from previously used data sets. The algorithms used were Decision Tree, Naive Bayes, MLP (Multi-Layer Perceptron), Logistic Regression, LSTM (Long Short-Term Memory model), and RNN (Recurrent Neural Network). Accuracy, accuracy, recall, F1-Score indicators and error rates were measured for evaluation. As a result, the LSTM algorithm had the highest accuracy.

오늘날 정보통신 기술이 급격하게 발달하면서 IT 인프라에서 보안의 중요성이 높아졌고 동시에 지능형 지속 공격(Advanced Persistent Threat)처럼 고도화되고 다양한 형태의 사이버 공격이 증가하고 있다. 점점 더 고도화되는 사이버 공격을 조기에 방어하거나 예측하는 것은 매우 중요한 사안으로, NIDS(Network-based Intrusion Detection System) 관련 데이터 분석만으로는 빠르게 변형하는 사이버 공격을 방어하지 못하는 경우가 많이 보고되고 있다. 따라서 현재는 HIDS(Host-based Intrusion Detection System) 데이터 분석을 통해서 위와 같은 사이버 공격을 방어하는데 침입 탐지 시스템에서 생성된 데이터를 이용하고 있다. 본 논문에서는 기존에 사용되었던 데이터 세트에서 결여된 스레드 정보, 메타 데이터 및 버퍼 데이터를 포함한 LID-DS(Leipzig Intrusion Detection-Data Set) 호스트 기반 침입 탐지 데이터를 이용하여 기계학습 알고리즘에 관한 비교 연구를 진행했다. 사용한 알고리즘은 Decision Tree, Naive Bayes, MLP(Multi-Layer Perceptron), Logistic Regression, LSTM(Long Short-Term Memory model), RNN(Recurrent Neural Network)을 사용했다. 평가를 위해 Accuracy, Precision, Recall, F1-Score 지표와 오류율을 측정했다. 그 결과 LSTM 알고리즘의 정확성이 가장 높았다.

Keywords

References

  1. Y. Su, M. Li, C. Tang, and R. Shen, "A framework of apt detection based on dynamic analysis," 2015 4th National Conference on Electrical, Electronics and Computer Engineering, Atlantis Press, 2015.
  2. Y. G. Choi and S. S. Park, "Reinforcement Mining Method for Anomaly Detection and Misuse Detection using Postꠓ processing and Training Method," Proceedings of the Korean Information Science Society Conference, pp.238-240, 2006.
  3. S. O. Choi and W. N. Kim, "Control system intrusion detection system technology research trend," Review of Korea Institute of Information Security and Cryptology, Vol.24, No.5, pp.7-14, 2014.
  4. J. JP. Tsai and S. Y. Philip, "Machine learning in cyber trust: Security privacy and reliability," Springer Science & Business Media, 2009.
  5. S. X. Wu and W. Banzhaf, "The use of computational intelligence in intrusion detection systems: A review," Applied Soft Computing, Vol.10, No.1, pp.1-35, 2010. https://doi.org/10.1016/j.asoc.2009.06.019
  6. M. S. Iftikhar and M. R. Fraz, "A Survey on Application of Swarm Intelligence in Network Security," Transactions on Machine Learning and Artificial Intelligence, Vol.1, No.1, pp.1-15, 2013.
  7. T. Mehmood and HBM. Rais, "Machine learning algorithms in context of intrusion detection," 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), Kuala Lumpur, pp.369-373, 2016.
  8. L. N. Tidjon, M. Frappier, and A. Mammar, "Intrusion detection systems: A cross-domain overview," IEEE Communications Surveys & Tutorials, Vol.21, No.4, pp.3639-3681, 2019. https://doi.org/10.1109/COMST.2019.2922584
  9. H. Kwon, Y. C. Kim, H. S. Yoon, and D. S. Choi, "Optimal cluster expansion-based intrusion tolerant system to prevent denial of service attacks," Applied Sciences, Vol.7, No.11, pp.1186, 2017. https://doi.org/10.3390/app7111186
  10. T. Mouttaqi, T. Rachidi, and N. Assem, "Re-evaluation of combined Markov-Bayes models for host intrusion detection on the ADFA dataset," 2017 Intelligent Systems Conference (IntelliSys), IEEE, 2017.
  11. O. Yavanoglu and M. Aydos, "A review on cyber security datasets for machine learning algorithms," 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, pp.2186-2193, 2017.
  12. M. Pendleton and S. Xu, "A dataset generator for next generation system call host intrusion detection systems," MILCOM 2017-2017 IEEE Military Communications Conference (MILCOM). IEEE, 2017.
  13. G. Creech and J. Hu, "A semantic approach to host-based intrusion detection systems using contiꠓguousand discontiguous system call patterns," IEEE Transactions on Computers, Vol.63, No.4, pp.807-819, 2013. https://doi.org/10.1109/TC.2013.13
  14. M. Xie and J. Hu, "Evaluating host-based anomaly detection systems: A preliminary analysis of adfa-ld," Image and Signal Processing (CISP), 2013 6th International Congress on. Vol.3. IEEE, 2013.
  15. P. Laskov, P. Dusse, C. Schafer, and K. Rieck, "Learning intrusion detection: supervised or unsupervised?," International Conference on Image Analysis and Processing. Springer, Berlin, Heidelberg, 2005.
  16. J. H. Kim and H. W. Kim, "An effective intrusion detection classifier using long short-term memory with gradient descent optimization," 2017 International Conference on Platform Technology and Service (PlatCon). IEEE, 2017.
  17. G. W. Kim, H. Y. Yi, J. H. Lee, Y. H. Paek, and S. R. Yoon, "LSTM-based system-call language modeling and robust ensemble method for designing host-based intrusion detection systems," arXiv preprint arXiv:1611.01726 (2016).
  18. R. D. Ravipati and M. Abualkibash, "Intrusion Detection System Classification Using Different Machine Learning Algorithms on KDD-99 and NSL-KDD Datasets-A Review Paper," International Journal of Computer Science & Information Technology (IJCSIT), Vol.11, 2019.
  19. M. M. Rohling, M. Grimmer, D. Kreubel, J. Hoffmann, and B. Franczyk, "Standardized container virtualization approach for collecting host intrusion detection data," 2019 Federated Conference on Computer Science and Information Systems (FedCSIS), IEEE, 2019.
  20. A. Khraisat, I. Gondal, P. Vamplew and J. Kamruzzaman, "Survey of intrusion detection systems: Techniques, datasets and challenges," Cybersecurity, Vol.2, No.1, pp.1-22, 2019. https://doi.org/10.1186/s42400-018-0018-3
  21. M. Grimmer, MM. Rohling, D. Kreubel and S. Ganz, "A modern and sophisticated host based intrusion detection data set," IT-Sicherheit als Voraussetzung Fur Eine Erfolgreiche Digitalisierung, pp.135-145, 2019.