[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7472/jksii.2021.22.2.77

An Interpretable Log Anomaly System Using Bayesian Probability and Closed Sequence Pattern Mining

Yun, Jiyoung (Department of Software, Gachon University)
Shin, Gun-Yoon (Department of Computer Engineering, Gachon University)
Kim, Dong-Wook (Department of Computer Engineering, Gachon University)
Kim, Sang-Soo (Agency for Defense Development)
Han, Myung-Mook (Department of Software, Gachon University)

Publication Information

Journal of Internet Computing and Services / v.22, no.2, 2021 , pp. 77-87 More about this Journal

Abstract

With the development of the Internet and personal computers, various and complex attacks begin to emerge. As the attacks become more complex, signature-based detection become difficult. It leads to the research on behavior-based log anomaly detection. Recent work utilizes deep learning to learn the order and it shows good performance. Despite its good performance, it does not provide any explanation for prediction. The lack of explanation can occur difficulty of finding contamination of data or the vulnerability of the model itself. As a result, the users lose their reliability of the model. To address this problem, this work proposes an explainable log anomaly detection system. In this study, log parsing is the first to proceed. Afterward, sequential rules are extracted by Bayesian posterior probability. As a result, the "If condition then results, post-probability" type rule set is extracted. If the sample is matched to the ruleset, it is normal, otherwise, it is an anomaly. We utilize HDFS datasets for the experiment, resulting in F1score 92.7% in test dataset.

Keywords

Explainable AI; Log anomaly detection; Bayesian probability; Rule extraction;

Citations & Related Records

Reference

1	P. He, J. Zhu, Z. Zheng, and M.R. Lyu, "Drain: An online log parsing approach with fixed depth tree." 2017 IEEE International Conference on Web Services (ICWS). IEEE, pp. 33-40, 2017. https://doi.org/10.1109/ICWS.2017.13 DOI
2	M. Du and F. Li, "Spell: Streaming parsing of system event logs." 2016 IEEE 16th International Conference on Data Mining (ICDM) IEEE, pp. 859-864, 2016. https://doi.org/10.1109/ICDM.2016.0103 DOI
3	W. Xu, L. Huang, A. Fox, D. Patterson, and M.I. Jordan, "Detecting large-scale system problems by mining console logs." Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pp.117-132, 2009. https://doi.org/10.1145/1629575.1629587 DOI
4	Q. Lin, H. Zhang, J.G. Lou, Y. Zhang, and X. Chen, "Log clustering based problem identification for online service systems." 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C). IEEE, pp. 102-111, 2016. http://dx.doi.org/10.1145/2889160.2889232 DOI
5	C. C. Aggarwal, M. A. Bhuiyan, and M. A. l. Hasan "Frequent pattern mining algorithms: A survey." Frequent pattern mining. Springer, Cham, pp.19-64, 2014. https://doi.org/10.1007/978-3-319-07821-2_2
6	X. Yan, J. Han, and R. Afshar, "Clospan: Mining: Closed sequential patterns in large datasets." Proceedings of the 2003 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp. 166-177, 2003. https://doi.org/10.1137/1.9781611972733.15 DOI
7	K. F. Man, K. S. Tang, and S. Kwong, "Genetic algorithms: concepts and applications [in engineering design]." IEEE transactions on Industrial Electronics Vol. 43, No. 5 pp. 519-534, 1996. https://doi.org/10.1109/41.538609 DOI
8	X. Zhang, Y. Xu, Q. Lin, B. Qiao, and H. Zhang et al., "Robust log-based anomaly detection on unstable log data." Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp. 807-817, 2019. https://doi.org/10.1145/3338906.3338931 DOI
9	M. Du, F. Li, G. Zheng, and V. Srikumar, "Deeplog: Anomaly detection and diagnosis from system logs through deep learning." Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1285-1298, 2017. https://doi.org/10.1145/3133956.3134015 DOI
10	W. Meng, Y. Liu, Y. Zhu, S. Zhang, and D. Pei et al. "LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs." IJCAI. Vol. 7. pp. 4739-4745, 2019. http://doi.org/10.24963/ijcai.2019/658 DOI
11	B. Letham, C. Rudin, T. H. McCormick, and D. Madigan, "Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model." Annals of Applied Statistics 9.3 Vol. 9, No. 3, pp. 1350-1371, 2015. https://doi.org/10.1214/15-AOAS848 DOI
12	J. G. Lou, Q. Fu, S. Yang, Y. Xu, and J. Li, "Mining Invariants from Console Logs for System Problem Detection." USENIX Annual Technical Conference, pp. 1-14, 2010. https://doi.org/10.5555/1855840.1855864 DOI
13	M. Pelikan, D. E. Goldberg, and E. Cantu-Paz, "BOA: The Bayesian optimization algorithm." Proceedings of the genetic and evolutionary computation conference GECCO-99. Vol. 1, pp. 525-532, 1999. https://dl.acm.org/doi/10.5555/2933923.2933973 DOI
14	J. Wang, and J. Han, "BIDE: Efficient mining of frequent closed sequences." Proceedings. 20th international conference on data engineering. IEEE, pp. 79-90, 2004. https://doi.org/10.1109/ICDE.2004.1319986 DOI
15	R. Yang, D. Qu, Y. Gao, Y. Qian, and Y. Tang, "NLSALog: An anomaly detection framework for log sequence in security management." IEEE Access, Vol. 7., pp. 181152-181164, 2019. http://doi.org/10.1109/ACCESS.2019.2953981 DOI
16	M. Landauer, M. Wurzenberger, F. Skopik, G. Settanni, and P. Filzmoser, "Dynamic log file analysis: An unsupervised cluster evolution approach for anomaly detection." computers & security Vol. 79, pp. 94-116, 2018. https://doi.org/10.1016/j.cose.2018.08.009 DOI
17	W. M. Bolstad, and J. M. Curran Curran. Introduction to Bayesian statistics. John Wiley & Sons, 2016. https://doi.org/10.1002/9781118593165
18	C. Forbes, M. Evans, N. Hastings, and B. Peacock, Statistical distributions. John Wiley & Sons, 2011. https://doi.org/10.1002/9780470627242
19	S. Brooks, A. Gelman, G. Jones, and X. L. Meng, eds. Handbook of markov chain monte carlo. CRC press, 2011. https://doi.org/10.1201/b10905
20	W. Xu, L. Huang, A. Fo, D. Patterson and M. I. Jordan, "Detecting large-scale system problems by mining console logs." Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. pp. 117-132, 2009. https://doi.org/10.1145/1629575.1629587 DOI
21	M. Hind, "Explaining explainable AI." XRDS: Crossroads, The ACM Magazine for Students, Vol.25 No.3 pp. 16-19, 2019. https://doi.org/10.1145/3313096 DOI
22	D. Gunning, and D. Aha, "DARPA's explainable artificial intelligence (XAI) program." AI Magazine Vol.40, No.2, pp. 44-58, 2019. https://doi.org/10.1609/aimag.v40i2.2850 DOI

KSCI

An Interpretable Log Anomaly System Using Bayesian Probability and Closed Sequence Pattern Mining 베이지안 확률 및 폐쇄 순차패턴 마이닝 방식을 이용한 설명가능한 로그 이상탐지 시스템

An Interpretable Log Anomaly System Using Bayesian Probability and Closed Sequence Pattern Mining