Browse > Article
http://dx.doi.org/10.13089/JKIISC.2022.32.5.1009

A Multiclass Classification of the Security Severity Level of Multi-Source Event Log Based on Natural Language Processing  

Seo, Yangjin (EPOZEN Co., Ltd.)
Abstract
Log data has been used as a basis in understanding and deciding the main functions and state of information systems. It has also been used as an important input for the various applications in cybersecurity. It is an essential part to get necessary information from log data, to make a decision with the information, and to take a suitable countermeasure according to the information for protecting and operating systems in stability and reliability, but due to the explosive increase of various types and amounts of log, it is quite challenging to effectively and efficiently deal with the problem using existing tools. Therefore, this study has suggested a multiclass classification of the security severity level of multi-source event log using machine learning based on natural language processing. The experimental results with the training and test samples of 472,972 show that our approach has archived the accuracy of 99.59%.
Keywords
Security Event Log; Natural Language Processing; Multi-source; Multiclass Classification; Cybersecurity;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 C. Wan, Y. Wang, Y. Liu, J. Ji, andG. Feng, "Composite feature extraction and selection for textclassification," IEEE Access, vol. 7,pp. 35208-35219, May 2019.   DOI
2 M. Landauer, F. Skopik, M.Wurzenberger, and A. Rauber,"System log clustering approaches for cyber security applications: Asurvey," Computers & Security, vol. 92, pp.101739-101756, May 2020.   DOI
3 J. Raffety, B. Stone, J. Svacina, C. Woodahl, T. Cerny, and P. Tisnovsky, "Multi-source log clustering in distributed systems," Proceedings of the 11th International Conference on Information Science and Applications, pp. 31-41, Dec. 2020.
4 "AI competition for predicting security risk level through log analysis", dacon.io/competitions/official/235717/overview/description, Aug. 2022
5 K. Erk, "Representing wordsasregions in vector space", Proceedingsof the 13th Conference on Computational Natural Language Learning, pp. 57-65, Jun. 2009.
6 P. He, J. Zhu, Z. Zheng, and M.R. Lyu, "Drain: An online log parsing approach with fixed depth tree," Proceedings of the 2017 IEEE International Conference on Web Services, pp. 33-40, Jun. 2017.
7 R. Yang, D. Qu, Y. Qian, Y. Dai, and S. Zhu, "An online log template extraction method based on hierarchical clustering," EURASIP Journal on Wireless Communications and Networking, vol. 2019, no. 1, pp. 882-895, Dec. 2019.
8 M. Du, F. Li, G. Zheng, and V. Srikumar, "Deeplog: Anomaly detection and diagnosis from system logs through deep learning," Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1285-1298, Oct. 2017.
9 Z. Chen, J. Liu, W. Gu, Y. Su, and M.R., Lyu, "Experience report: Deeplearning-based systemlog analysis for anomaly detection," arXiv preprintarXiv:2107.05908, Jul. 2021.
10 J. Zhu, S. He, J. Liu, P. He, Q. Xie,Z. Zheng, and M.R. Lyu, "Tools and benchmarks for automated log parsing," Proceedings of IEEE/ACM 41st International Conference on Software Engineering, pp. 121-130,May 2019.
11 W. Meng, Y. Liu, Y. Zhu, S. Zhang, D. Pei, Y. Liu, Y. Chen, R. Zhang, S. Tao, P. Sun, and R. Zhou, "LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs," Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 4739-4745, Aug. 2019.
12 Z. Liu, T. Qin, X. Guan, H. Jiang, and C. Wang, "An integrated method for anomaly detection frommassivesystem logs," IEEE Access, vol. 6, pp.30602-30611, Jun. 2018.   DOI
13 T. van Ede, H. Aghakhani, N. Spahn,R. Bortolameotti, M. Cova, A.Continella, M. van Steen, A. Peter,C. Kruegel, and G. Vigna,"DEEPCASE: Semi-supervised contextual analysis of securityevents," Proceedings of the 43rd IEEE Symposium on Security and Privacy,pp. 522-539, May 2022.
14 Y. Liu, Y. Wang, and J. Zhang, "Newmachine learning algorithm: Randomforest," Proceedings of the 3rd International Conference on Information Computing and Applications, pp. 246-252, Sep. 2012.
15 Y. Zhang, R. Jin, and Z.H. Zhou,"Understanding bag-of-words model:Astatistical framework," International Journal of Machine Learningand Cybernetics, Vol. 1, no. 1, pp. 43-52,Dec. 2010.   DOI
16 "NLP based log analysis test", allaboutxai.github.io/ml_dl/2022/08/25/ml_dl-ml_LogAnalysis/, Aug. 2022
17 S. He, P. He, Z. Chen, T. Yang, Y.Su, and M.R. Lyu, "A survey on automated log analysis for reliability engineering," ACM Computing Surveys, vol. 54, no. 6, pp. 1-37, Jul.2021.