DOI QR코드

DOI QR Code

Accurate and Efficient Log Template Discovery Technique

  • Tak, Byungchul (School of Computer Science and Engineering, Kyungpook National University)
  • Received : 2018.09.28
  • Accepted : 2018.10.22
  • Published : 2018.10.31

Abstract

In this paper we propose a novel log template discovery algorithm which achieves high quality of discovered log templates through iterative log filtering technique. Log templates are the static string pattern of logs that are used to produce actual logs by inserting variable values during runtime. Identifying individual logs into their template category correctly enables us to conduct automated analysis using state-of-the-art machine learning techniques. Our technique looks at the group of logs column-wise and filters the logs that have the value of the highest proportion. We repeat this process per each column until we are left with highly homogeneous set of logs that most likely belong to the same log template category. Then, we determine which column is the static part and which is the variable part by vertically comparing all the logs in the group. This process repeats until we have discovered all the templates from given logs. Also, during this process we discover the custom patterns such as ID formats that are unique to the application. This information helps us quickly identify such strings in the logs as variable parts thereby further increasing the accuracy of the discovered log templates. Existing solutions suffer from log templates being too general or too specific because of the inability to detect custom patterns. Through extensive evaluations we have learned that our proposed method achieves 2 to 20 times better accuracy.

Keywords

References

  1. K. Kc and X. Gu. Elt, "Efficient log-based troubleshooting system for computing infrastructures," In IEEE International Symposium on Reliable Distributed Systems (SRDS), Oct. 2011.
  2. B. C. Tak, S. Tao, L. Yang, C. Zhu, and Y. Ruan, "LOGAN: Problem Diagnosis in the Cloud Using Log-based Reference Models," In IEEE International Conference on Cloud Engineering (IC2E), pp. 62-67, Apr. 2016
  3. X. Fu, R. Ren, S. A. McKee, J. Zhan, and N. Sun, "Digging deeper into cluster system logs for failure prediction and root cause diagnosis," In 2014 IEEE International Conference on Cluster Computing (CLUSTER), pages 103 -112. IEEE, Sep. 2014.
  4. Min Du, Feifei Li, Guineng Zheng, Vivek Srikumar, "DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning," In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, Oct. 2017.
  5. W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, "Detecting large-scale system problems by mining console logs," In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, pages 117-132, New York, NY, USA, Oct. 2009. ACM.
  6. J.-G. Lou, Q. Fu, S. Yang, Y. Xu, and J. Li, "Mining invariants from console logs for system problem detection," In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'10, pages 24-24, Berkeley, CA, USA, Jun. 2010. USENIX Association.
  7. C. Lim, N. Singh, and S. Yajnik, "A log mining approach to failure analysis of enterprise telephony systems," In IEEE International Conference on Dependable Systems and Networks, Jun. 2008.
  8. K. Nagaraj, C. Killian, and J. Neville, "Structured comparative analysis of systems logs to diagnose performance problems," In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12, pages 26-26, Berkeley, CA, USA, Apr. 2012.
  9. Risto Vaarandi, "A Data Clustering Algorithm for Mining Patterns from Event Logs," In Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM). Piscataway, NJ, USA, 119-126, Oct. 2003.
  10. R Vaarandi, "Mining event logs with SLCT and LogHound". In 2008 IEEE Network Operations and Management Symposium (NOMS). Apr. 2008.
  11. A. Makanju, A. Zincir-Heywood, and E. Milios, "A lightweight algorithm for message type extraction in system application logs," IEEE Transactions on Knowledge and Data Engineering, Nov. 2012.
  12. A. A. Makanju, A. N. Zincir-Heywood, and E. E. Milios, "Clustering event logs using iterative partitioning," in KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, Jun. 2009, pp. 1255-1264.
  13. L. Tang, T. Li, and C. shing Perng, "Logsig: Generating system events from raw textual logs," in Proceedings of ACM Conference on Information and Knowledge Management (CIKM), Oct. 2011.
  14. Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu, "Drain: An Online Log Parsing Approach with Fixed Depth Tree," In Proceedings of the IEEE International Conference on Web Services (ICWS 2017). Piscataway, NJ, USA, 33-40, Jun. 2017.
  15. Q. Fu, J. Lou, Y. Wang, and J. Li, "Execution anomaly detection in distributed systems through unstructured log analysis," in ICDM'09: Proc. of International Conference on Data Mining, Dec. 2009.
  16. Apache Hadoop, https://hadoop.apache.org/
  17. OpenStack, https://www.openstack.org/
  18. Apache Cassandra, http://cassandra.apache.org/
  19. P. He, J. Zhu, S. He, J. Li and M. R. Lyu, "An Evaluation Study on Log Parsing and Its Use in Log Mining", in Proceedings of DSN'16: Proc. of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Jun. 2016.
  20. L. Tang and T. Li, "Logtree: A framework for generating system events from raw textual logs," In Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM '10. Washington, DC, USA: IEEE Computer Society, Dec. 2010.
  21. M. Mizutani, "Incremental mining of system log format," in SCC'13: Proc. of the 10th International Conference on Services Computing, Jun. 2013.
  22. Apache Hadoop YARN, https://hortonworks.com/apache/yarn/
  23. Fei Wu, Pranay Anchuri, Zhenhui Li, "Structural Event Detection from Log Messages," Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, Aug. 2017.
  24. Animesh Nandi, Atri Mandal, Shubham Atreja, Gargi B. Dasgupta, Subhrajit Bhattacharya, "Anomaly Detection Using Program Control Flow Graph Mining from Execution Logs," Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, Aug. 2016.
  25. Rui Ding, Hucheng Zhou, Jian-Guang Lou, Hongyu Zhang, Qingwei Lin, Qiang Fu, Dongmei Zhang, Tao Xie, "Log 2 : A Cost-Aware Logging Mechanism for Performance Diagnosis," In Proceedings of the 2015 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'15, Jun. 2010. USENIX Association.