An Architecture-based Multi-level Self-Adaptive Monitoring Method for Software Fault Detection

소프트웨어 오류 탐지를 위한 아키텍처 기반의 다계층적 자가적응형 모니터링 방법

  • 윤현지 (서강대학교 컴퓨터공학과) ;
  • 박수용 (서강대학교 컴퓨터공학과)
  • Received : 2010.05.03
  • Accepted : 2010.05.31
  • Published : 2010.07.15

Abstract

Self-healing is one of the techniques that assure dependability of mission-critical system. Self-healing consists of fault detection and fault recovery and fault detection is important first step that enables fault recovery but it causes overhead. We can detect fault based on model, the detection tasks that notify system's behavior and compare normal behavior model and system's behavior are heavy jobs. In this paper, we propose architecture-based multi-level self-adaptive monitoring method that complements model-based fault detection. The priority of fault detection per component is different in the software architecture. Because the seriousness and the frequency of fault per component are different. If the monitor is adapted to intensive to the component that has high priority of monitoring and loose to the component that has low priority of monitoring, the overhead can be decreased and the efficiency can be maintained. Because the environmental changes of software and the architectural changes bring the changes at the priority of fault detection, the monitor learns the changes of fault frequency and that is adapted to intensive to the component that has high priority of fault detection.

Mission-critical 시스템의 경우 자가 치유는 신뢰성을 보장하기 위한 기술 중 하나이다. 자가치유는 오류 탐지와 오류 회복으로 이루어져 있으며 오류 탐지는 오류 회복을 가능하게 하는 자가 치유의 중요한 첫 단계이지만 시스템에 과부하를 주는 문제가 있다. 모델 기반의 방법 등으로 오류를 탐지할 수 있는데 시스템의 모든 행위를 통지하고 정상 행위 모델과 통지된 시스템의 행위를 비교하여야 하므로 그양이 많고 부하가 크기 때문이다. 본 논문에서는 모델 기반의 오류 탐지 방법을 보완하는 아키텍처 기반의 다계층적 자가적응형 모니터링 방법을 제안한다. 소프트웨어 아키텍처 상에서 오류 탐지의 중요도는 컴포넌트 마다 다르다. 각 컴포넌트마다 발생하는 오류의 심각도와 빈도가 다르기 때문이다. 모니터링 중요도가 높은 컴포넌트에는 강도가 높고 모니터링 중요도가 낮은 컴포넌트에는 강도가 낮도록 모니터가 적응한다면 오류 탐지의 부하는 줄이고 효율은 유지시킬 수 있다. 또한 소프트웨어의 환경 변화 및 아키텍처상의 변화 등에 따라 오류 발생 빈도가 변화하여 컴포넌트의 오류 탐지 중요도가 변화하기 때문에 학습을 통해 이를 추적하여 자가적응적으로 중요도가 높은 컴포넌트를 집중 모니터링 한다.

Keywords

References

  1. "IEEE Standard Dictionary of Measures of the Software Aspects of Dependability," IEEE Std 982.1-2005 (Revision of IEEE Std 982.1-1988), IEEE Press, 2006.
  2. Ghosh, D., Sharman, R., Rao, H.R., Upadhyaya, S.: Self-healing systems - survey and synthesis, Decision Support System, vol.42, no.4, pp.2164-2185 (2007). https://doi.org/10.1016/j.dss.2006.06.011
  3. D. Garlan and B. Schmerl, Model-based adaptation for self-healing systems, Proceedings of the first workshop on Self-healing systems, ACM Press, Charleston, South Carolina, 2002.
  4. M.G. Merideth, P. Narasimhan, Proactive containment of malice in survivable distributed system, International Conference on Security and Management, Las Vegas, NV, 2003.
  5. Michael E. Shin, and Yan Xu, Detection of Anomalies in a Software Architecture with connectors, International Workshop on System/Software Architectures (WSSA05), LasVegas, Nevada, USA, vol.61, Issue 1, pp.6-26, June 2005.
  6. C. Andersson and P. Runeson, "A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems," IEEE Trans. Software Eng., vol.33, no.5, pp.273-286, May 2007. https://doi.org/10.1109/TSE.2007.1005
  7. Davis, Alan M. 1995. 201 Principles of Software Development. New York: McGraw-Hill. Principle 114.
  8. http://robocode.sourceforge.net/
  9. S.W. Cheng, D. Garlan, B. Schmerl, P. Steenkiste, N. Hu, Software architecture-based adaptation for grid computing, The 11th IEEE Conference on High Performance Distributed Computing (HPDC '02), Edinburgh, Scotland., 2002.
  10. G. Valetto, G.E. Kaiser, Case study in software adaptation, Proceedings of the FirstWorkshop on Self-Healing Systems, 2002.
  11. S. Bagchi, B. Srinivasan, K. Whisnant, Z. Kalbarczyk, and R. Iyer, Hierarchical Error Detection in a Software Implemented Fault Tolerance (SIFT) Environment, IEEE Transactions on Knowledge and Data Engineering, vol.12, no.2, pp.203-224, March/April 2000. https://doi.org/10.1109/69.842263
  12. Jinho Ahn, Efficient Failure Detection and Recovery Scheme for Hierarchical Distributed Monitoring, fgcn, vol.2, pp.510-515, Future Generation Communication and Networking (FGCN 2007) - Volume 1, 2007.
  13. Midori Sugaya, Yuki Ohno, Andrej van der Zee, Tatsuo Nakajima, A Lightweight Anomaly Detection System for Information Appliances, isorc, pp.257-266, 2009 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, 2009.
  14. Benjamin Satzger, Andreas Pietzowski, Wolfgang Trumler, Theo Ungerer, A Lazy Monitoring Approach for Heartbeat-Style Failure Detectors, ares, pp.404-409, 2008, Third International Conference on Availability, Reliability and Security, 2008.