DOI QR코드

DOI QR Code

Comparative Analysis of Machine Learning Techniques for IoT Anomaly Detection Using the NSL-KDD Dataset

  • Zaryn, Good (Department of Mathematical and Computer Sciences, Indiana University of PA) ;
  • Waleed, Farag (Department of Mathematical and Computer Sciences, Indiana University of PA) ;
  • Xin-Wen, Wu (Department of Computer Science, University of Mary Washington) ;
  • Soundararajan, Ezekiel (Department of Mathematical and Computer Sciences, Indiana University of PA) ;
  • Maria, Balega (Department of Mathematical and Computer Sciences, Indiana University of PA) ;
  • Franklin, May (Department of Mathematical and Computer Sciences, Indiana University of PA) ;
  • Alicia, Deak (Department of Mathematical and Computer Sciences, Indiana University of PA)
  • Received : 2023.01.05
  • Published : 2023.01.30

Abstract

With billions of IoT (Internet of Things) devices populating various emerging applications across the world, detecting anomalies on these devices has become incredibly important. Advanced Intrusion Detection Systems (IDS) are trained to detect abnormal network traffic, and Machine Learning (ML) algorithms are used to create detection models. In this paper, the NSL-KDD dataset was adopted to comparatively study the performance and efficiency of IoT anomaly detection models. The dataset was developed for various research purposes and is especially useful for anomaly detection. This data was used with typical machine learning algorithms including eXtreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), and Deep Convolutional Neural Networks (DCNN) to identify and classify any anomalies present within the IoT applications. Our research results show that the XGBoost algorithm outperformed both the SVM and DCNN algorithms achieving the highest accuracy. In our research, each algorithm was assessed based on accuracy, precision, recall, and F1 score. Furthermore, we obtained interesting results on the execution time taken for each algorithm when running the anomaly detection. Precisely, the XGBoost algorithm was 425.53% faster when compared to the SVM algorithm and 2,075.49% faster than the DCNN algorithm. According to our experimental testing, XGBoost is the most accurate and efficient method.

Keywords

Acknowledgement

This work is supported by the NSA under NSA grant H98230-20-1-0296. We would like to thank Dr. Pearlstein, TCNJ, for the use of Zelda. In addition, we would like to thank all members of the IUP IoT Research team.

References

  1. S. Madakam, R. Ramaswamy, and S. Tripathi, "Internet of things (iot): A literature review," Journal of Computer and Communications, vol. 3, pp. 164-173, 04 2015.  https://doi.org/10.4236/jcc.2015.35021
  2. M. Balega, W. Farag, S. Ezekiel, X.-W. Wu, A. Deak, and Z. Good, "IoT Anomaly Detection Using a Multitude of Machine Learning Algorithms," in the proceedings of the IEEE Applied Imagery Pattern Recognition Workshop, Oct. 11-13, 2022, Washington, DC. 
  3. M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, "A detailed analysis of the kdd cup 99 data set," in 2009 IEEE symposium on computational intelligence for security and defense applications. Ieee, 2009, pp. 1-6. 
  4. R. P. Lippmann, D. J. Fried, I. Graf, J. W. Haines, K. R. Kendall, D. McClung, D. Weber, S. E. Webster, D. Wyschogrod, R. K. Cunningham et al., "Evaluating intrusion detection systems: The 1998 darpa offline intrusion detection evaluation," in Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00, vol. 2. IEEE, 2000, pp. 12-26. 
  5. M. Palacios, "A Comparison of ANNs, SVMs & XGBoost on some Challenging Classification Problems," Oct. 2019, eigenvector Research Incorporated. [Online]. Available: https://eigenvector.com/wp-content/uploads/2020/03/WiseAPACTNonlinearComparison.pdf 
  6. S. Revathi and A. Malathi, "A detailed analysis on nsl-kdd dataset using various machine learning techniques for intrusion detection," International Journal of Engineering Research & Technology (IJERT), vol. 2, no. 12, pp. 1848-1853, 2013. 
  7. J. Canedo and A. Skjellum, "Using machine learning to secure iot systems," in 2016 14th Annual Conference on Privacy, Security and Trust (PST), 2016, pp. 219-222 
  8. F. Hussain, R. Hussain, S. A. Hassan, and E. Hossain, "Machine learning in iot security: Current solutions and future challenges," IEEE Communications Surveys & Tutorials, vol. 22, no. 3, pp. 1686-1721, 2020  https://doi.org/10.1109/COMST.2020.2986444
  9. L. Wu and J. Fan, "Comparison of neuron-based, kernel-based, treebased and curve-based machine learning models for predicting daily reference evapotranspiration," PLOS ONE, vol. 14, no. 5, pp. 1-27, 052019. [Online]. Available: https://doi.org/10.1371/journal.pone.0217520 
  10. P. Das, et al., "Amazon sagemaker autopilot: a white box automl solution at scale," CoRR, vol. abs/2012.08483, 2020. [Online]. Available: https://arxiv.org/abs/2012.08483 
  11. T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD '16. New York, NY, USA: Association for Computing Machinery, 2016, p. 785-794. [Online]. Available: https://doi.org/10.1145/2939672.2939785 
  12. "State of Machine Learning and Data Science 2020," 2020. [Online]. Available: https://www.kaggle.com/kaggle-survey-2020 
  13. Xgboost documentation. [Online]. Available: https://xgboost.readthedocs.io/en/stable/index.html 
  14. C. Cortes and V. Vapnik, "Support-vector networks," in Machine Learning, 1995, pp. 273-297. 
  15. V. Jakkula, "Tutorial on support vector machine (svm)," School of EECS, Washington State University, vol. 37, no. 2.5, p. 3, 2006. 
  16. R. Pupale, "Support vector machines(svm) - an overview," June 2018. [Online]. Available: https://towardsdatascience.com/https-medium-com-pupalerushikesh-svm-f4b42800e989#:~:text=SVM 
  17. S. Ezekiel, L. Pearlstein, A. Alshehri, A. Lutz, J. Zaunegger, and W. Farag, "Investigating gan and vae to train dcnn," International Journal of Machine Learning and Computing, vol. 9, pp. 774-781, 12 2019. https://doi.org/10.18178/ijmlc.2019.9.6.872