Browse > Article

An Architecture-based Multi-level Self-Adaptive Monitoring Method for Software Fault Detection  

Youn, Hyun-Ji (서강대학교 컴퓨터공학과)
Park, Soo-Yong (서강대학교 컴퓨터공학과)
Abstract
Self-healing is one of the techniques that assure dependability of mission-critical system. Self-healing consists of fault detection and fault recovery and fault detection is important first step that enables fault recovery but it causes overhead. We can detect fault based on model, the detection tasks that notify system's behavior and compare normal behavior model and system's behavior are heavy jobs. In this paper, we propose architecture-based multi-level self-adaptive monitoring method that complements model-based fault detection. The priority of fault detection per component is different in the software architecture. Because the seriousness and the frequency of fault per component are different. If the monitor is adapted to intensive to the component that has high priority of monitoring and loose to the component that has low priority of monitoring, the overhead can be decreased and the efficiency can be maintained. Because the environmental changes of software and the architectural changes bring the changes at the priority of fault detection, the monitor learns the changes of fault frequency and that is adapted to intensive to the component that has high priority of fault detection.
Keywords
Self-healing; Fault detection; Overhead; Monitoring; Architecture;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S.W. Cheng, D. Garlan, B. Schmerl, P. Steenkiste, N. Hu, Software architecture-based adaptation for grid computing, The 11th IEEE Conference on High Performance Distributed Computing (HPDC '02), Edinburgh, Scotland., 2002.
2 G. Valetto, G.E. Kaiser, Case study in software adaptation, Proceedings of the FirstWorkshop on Self-Healing Systems, 2002.
3 S. Bagchi, B. Srinivasan, K. Whisnant, Z. Kalbarczyk, and R. Iyer, Hierarchical Error Detection in a Software Implemented Fault Tolerance (SIFT) Environment, IEEE Transactions on Knowledge and Data Engineering, vol.12, no.2, pp.203-224, March/April 2000.   DOI   ScienceOn
4 Jinho Ahn, Efficient Failure Detection and Recovery Scheme for Hierarchical Distributed Monitoring, fgcn, vol.2, pp.510-515, Future Generation Communication and Networking (FGCN 2007) - Volume 1, 2007.
5 Midori Sugaya, Yuki Ohno, Andrej van der Zee, Tatsuo Nakajima, A Lightweight Anomaly Detection System for Information Appliances, isorc, pp.257-266, 2009 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, 2009.
6 Benjamin Satzger, Andreas Pietzowski, Wolfgang Trumler, Theo Ungerer, A Lazy Monitoring Approach for Heartbeat-Style Failure Detectors, ares, pp.404-409, 2008, Third International Conference on Availability, Reliability and Security, 2008.
7 M.G. Merideth, P. Narasimhan, Proactive containment of malice in survivable distributed system, International Conference on Security and Management, Las Vegas, NV, 2003.
8 "IEEE Standard Dictionary of Measures of the Software Aspects of Dependability," IEEE Std 982.1-2005 (Revision of IEEE Std 982.1-1988), IEEE Press, 2006.
9 Ghosh, D., Sharman, R., Rao, H.R., Upadhyaya, S.: Self-healing systems - survey and synthesis, Decision Support System, vol.42, no.4, pp.2164-2185 (2007).   DOI   ScienceOn
10 D. Garlan and B. Schmerl, Model-based adaptation for self-healing systems, Proceedings of the first workshop on Self-healing systems, ACM Press, Charleston, South Carolina, 2002.
11 Michael E. Shin, and Yan Xu, Detection of Anomalies in a Software Architecture with connectors, International Workshop on System/Software Architectures (WSSA05), LasVegas, Nevada, USA, vol.61, Issue 1, pp.6-26, June 2005.
12 C. Andersson and P. Runeson, "A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems," IEEE Trans. Software Eng., vol.33, no.5, pp.273-286, May 2007.   DOI
13 Davis, Alan M. 1995. 201 Principles of Software Development. New York: McGraw-Hill. Principle 114.
14 http://robocode.sourceforge.net/