Browse > Article
http://dx.doi.org/10.11627/jkise.2016.39.2.129

Correlation Analysis of Event Logs for System Fault Detection  

Park, Ju-Won (Div. of Supercomputing, KISTI)
Kim, Eunhye (Hyper-connected Communication Research Lab. ETRI)
Yeom, Jaekeun (Div. of Supercomputing, KISTI)
Kim, Sungho (Div. of Supercomputing, KISTI)
Publication Information
Journal of Korean Society of Industrial and Systems Engineering / v.39, no.2, 2016 , pp. 129-137 More about this Journal
Abstract
To identify the cause of the error and maintain the health of system, an administrator usually analyzes event log data since it contains useful information to infer the cause of the error. However, because today's systems are huge and complex, it is almost impossible for administrators to manually analyze event log files to identify the cause of an error. In particular, as OpenStack, which is being widely used as cloud management system, operates with various service modules being linked to multiple servers, it is hard to access each node and analyze event log messages for each service module in the case of an error. For this, in this paper, we propose a novel message-based log analysis method that enables the administrator to find the cause of an error quickly. Specifically, the proposed method 1) consolidates event log data generated from system level and application service level, 2) clusters the consolidated data based on messages, and 3) analyzes interrelations among message groups in order to promptly identify the cause of a system error. This study has great significance in the following three aspects. First, the root cause of the error can be identified by collecting event logs of both system level and application service level and analyzing interrelations among the logs. Second, administrators do not need to classify messages for training since unsupervised learning of event log messages is applied. Third, using Dynamic Time Warping, an algorithm for measuring similarity of dynamic patterns over time increases accuracy of analysis on patterns generated from distributed system in which time synchronization is not exactly consistent.
Keywords
Correlation Analysis; Unsupervised Learning; System Fault Detection;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Feinerer, I. and Hornik, K., Package tm, Tech. Rep., CRAN, 2013.
2 Gerhards, R., The syslog protocol, RFC 5424, 2009.
3 Joo, W.-M. and Choi, J.Y., Curriculum mining analysis using clustering-based process mining, Journal of Society of Korea Industrial and Systems Engineering, 2015, Vol. 38, No. 4, pp. 45-55.   DOI
4 Kaufman, K. and Rousseeuw, P.J., Finding groups in data : An introduction to cluster analysis, John Wiley and Sons, 2009.
5 Oliner, A. and Stearley, J., What supercomputers say : A study of five system logs, in Proc. of 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2007, Edinburgh, pp. 575-584.
6 OpenStack, OpenStack open source cloud computing software, [Online]. Available at : http://www.openstack.org/.
7 Park, Y.-S., Yoon, B.-N., and Lim, J-H., An empirical study on faults prediction for large scale telecommunication software, Journal of the Korean Society for Quality Management, 1999, Vol. 27, No. 2, pp. 263-276.
8 Pitakrat, T., Grunert, J., Kabierschke, O., Keller, F., and Hoorn, A., A framework for system event classification and prediction by means of machine learning, in Proc. of the 8th International Conference on Performance Evaluation Methodologies and Tools. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2014, pp. 173-180.
9 R development core team, R : A language and environment for statistical computing, [Online], Available : http://www.r-project.org.
10 RSYSLOG : The rocket-fast system for log processing. Available: http://www.rsyslog.com.
11 Sahoo, R.K., Oliner, A.J., Rish, I., Gupta, M., Moreira, J.E., MA, S., Vilalta, R., and Sivasubramaniam, A., Critical event prediction for proactive management in large-scale computer clusters, in Proc. of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 426-435.
12 Sakoe, H. and Chiba, S., Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, 1978, Vol. 26, No. 1, pp. 43-49.   DOI
13 Yoo, J., Module communization for product platform design using clustering analysis, Journal of Society of Korea Industrial and Systems Engineering, 2014, Vol. 37, No. 3, pp. 89-98.   DOI
14 Zheng, Z., Lan, Z., Gupta, R., Coghlan, S., and Beckman, P., A practical failure prediction with location and lead time for Blue Gene/P, in Proc. of International Conference on Dependable Systems and Networks Workshops (DSN-W), 2010, pp. 15-22.
15 Zheng, Z., Lan, Z., Park, B.H., and Geist, A., System log pre-processing to improve failure prediction, in Proc. of IEEE/IFIP International Conference on Dependable Systems and Networks, 2009, pp. 572-577.