Browse > Article
http://dx.doi.org/10.7472/jksii.2021.22.2.77

An Interpretable Log Anomaly System Using Bayesian Probability and Closed Sequence Pattern Mining  

Yun, Jiyoung (Department of Software, Gachon University)
Shin, Gun-Yoon (Department of Computer Engineering, Gachon University)
Kim, Dong-Wook (Department of Computer Engineering, Gachon University)
Kim, Sang-Soo (Agency for Defense Development)
Han, Myung-Mook (Department of Software, Gachon University)
Publication Information
Journal of Internet Computing and Services / v.22, no.2, 2021 , pp. 77-87 More about this Journal
Abstract
With the development of the Internet and personal computers, various and complex attacks begin to emerge. As the attacks become more complex, signature-based detection become difficult. It leads to the research on behavior-based log anomaly detection. Recent work utilizes deep learning to learn the order and it shows good performance. Despite its good performance, it does not provide any explanation for prediction. The lack of explanation can occur difficulty of finding contamination of data or the vulnerability of the model itself. As a result, the users lose their reliability of the model. To address this problem, this work proposes an explainable log anomaly detection system. In this study, log parsing is the first to proceed. Afterward, sequential rules are extracted by Bayesian posterior probability. As a result, the "If condition then results, post-probability" type rule set is extracted. If the sample is matched to the ruleset, it is normal, otherwise, it is an anomaly. We utilize HDFS datasets for the experiment, resulting in F1score 92.7% in test dataset.
Keywords
Explainable AI; Log anomaly detection; Bayesian probability; Rule extraction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 P. He, J. Zhu, Z. Zheng, and M.R. Lyu, "Drain: An online log parsing approach with fixed depth tree." 2017 IEEE International Conference on Web Services (ICWS). IEEE, pp. 33-40, 2017. https://doi.org/10.1109/ICWS.2017.13   DOI
2 M. Du and F. Li, "Spell: Streaming parsing of system event logs." 2016 IEEE 16th International Conference on Data Mining (ICDM) IEEE, pp. 859-864, 2016. https://doi.org/10.1109/ICDM.2016.0103   DOI
3 W. Xu, L. Huang, A. Fox, D. Patterson, and M.I. Jordan, "Detecting large-scale system problems by mining console logs." Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pp.117-132, 2009. https://doi.org/10.1145/1629575.1629587   DOI
4 Q. Lin, H. Zhang, J.G. Lou, Y. Zhang, and X. Chen, "Log clustering based problem identification for online service systems." 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C). IEEE, pp. 102-111, 2016. http://dx.doi.org/10.1145/2889160.2889232   DOI
5 C. C. Aggarwal, M. A. Bhuiyan, and M. A. l. Hasan "Frequent pattern mining algorithms: A survey." Frequent pattern mining. Springer, Cham, pp.19-64, 2014. https://doi.org/10.1007/978-3-319-07821-2_2
6 X. Yan, J. Han, and R. Afshar, "Clospan: Mining: Closed sequential patterns in large datasets." Proceedings of the 2003 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp. 166-177, 2003. https://doi.org/10.1137/1.9781611972733.15   DOI
7 K. F. Man, K. S. Tang, and S. Kwong, "Genetic algorithms: concepts and applications [in engineering design]." IEEE transactions on Industrial Electronics Vol. 43, No. 5 pp. 519-534, 1996. https://doi.org/10.1109/41.538609   DOI
8 X. Zhang, Y. Xu, Q. Lin, B. Qiao, and H. Zhang et al., "Robust log-based anomaly detection on unstable log data." Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp. 807-817, 2019. https://doi.org/10.1145/3338906.3338931   DOI
9 M. Du, F. Li, G. Zheng, and V. Srikumar, "Deeplog: Anomaly detection and diagnosis from system logs through deep learning." Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1285-1298, 2017. https://doi.org/10.1145/3133956.3134015   DOI
10 W. Meng, Y. Liu, Y. Zhu, S. Zhang, and D. Pei et al. "LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs." IJCAI. Vol. 7. pp. 4739-4745, 2019. http://doi.org/10.24963/ijcai.2019/658   DOI
11 B. Letham, C. Rudin, T. H. McCormick, and D. Madigan, "Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model." Annals of Applied Statistics 9.3 Vol. 9, No. 3, pp. 1350-1371, 2015. https://doi.org/10.1214/15-AOAS848   DOI
12 J. G. Lou, Q. Fu, S. Yang, Y. Xu, and J. Li, "Mining Invariants from Console Logs for System Problem Detection." USENIX Annual Technical Conference, pp. 1-14, 2010. https://doi.org/10.5555/1855840.1855864   DOI
13 M. Pelikan, D. E. Goldberg, and E. Cantu-Paz, "BOA: The Bayesian optimization algorithm." Proceedings of the genetic and evolutionary computation conference GECCO-99. Vol. 1, pp. 525-532, 1999. https://dl.acm.org/doi/10.5555/2933923.2933973   DOI
14 J. Wang, and J. Han, "BIDE: Efficient mining of frequent closed sequences." Proceedings. 20th international conference on data engineering. IEEE, pp. 79-90, 2004. https://doi.org/10.1109/ICDE.2004.1319986   DOI
15 R. Yang, D. Qu, Y. Gao, Y. Qian, and Y. Tang, "NLSALog: An anomaly detection framework for log sequence in security management." IEEE Access, Vol. 7., pp. 181152-181164, 2019. http://doi.org/10.1109/ACCESS.2019.2953981   DOI
16 M. Landauer, M. Wurzenberger, F. Skopik, G. Settanni, and P. Filzmoser, "Dynamic log file analysis: An unsupervised cluster evolution approach for anomaly detection." computers & security Vol. 79, pp. 94-116, 2018. https://doi.org/10.1016/j.cose.2018.08.009   DOI
17 W. M. Bolstad, and J. M. Curran Curran. Introduction to Bayesian statistics. John Wiley & Sons, 2016. https://doi.org/10.1002/9781118593165
18 C. Forbes, M. Evans, N. Hastings, and B. Peacock, Statistical distributions. John Wiley & Sons, 2011. https://doi.org/10.1002/9780470627242
19 S. Brooks, A. Gelman, G. Jones, and X. L. Meng, eds. Handbook of markov chain monte carlo. CRC press, 2011. https://doi.org/10.1201/b10905
20 W. Xu, L. Huang, A. Fo, D. Patterson and M. I. Jordan, "Detecting large-scale system problems by mining console logs." Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. pp. 117-132, 2009. https://doi.org/10.1145/1629575.1629587   DOI
21 M. Hind, "Explaining explainable AI." XRDS: Crossroads, The ACM Magazine for Students, Vol.25 No.3 pp. 16-19, 2019. https://doi.org/10.1145/3313096   DOI
22 D. Gunning, and D. Aha, "DARPA's explainable artificial intelligence (XAI) program." AI Magazine Vol.40, No.2, pp. 44-58, 2019. https://doi.org/10.1609/aimag.v40i2.2850   DOI