Browse > Article
http://dx.doi.org/10.3745/KIPSTA.2008.15-A.3.151

Design and Implementation of Adaptive Fault-Tolerant Management System over Grid  

Kim, Eun-Kyung (SK 커뮤니케이션즈)
Kim, Jeu-Young (숙명여자대학교 컴퓨터과학)
Kim, Yoon-Hee (숙명여자대학교 컴퓨터과학과)
Abstract
A middleware in grid computing environment is required to support seamless on-demand services over diverse resource situations in order to meet various user requirements [1]. Since grid computing applications need situation-aware middleware services in this environment. In this paper, we propose a semantic middleware architecture to support dynamic software component reconfiguration based fault and service ontology to provide fault-tolerance in a grid computing environment. Our middleware includes autonomic management to detect faults, analyze causes of them, and plan semantically meaningful strategies to recover from the failure using pre-defined fault and service ontology trees. We implemented a referenced prototype, Web-service based Application Execution Environment(Wapee), as a proof-of-concept, and showed the efficiency in runtime recovery.
Keywords
Autonomic Middleware; Ontology; Fault-tolerance; Grid;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Eduardo Ostertag, James Hendler, Ruben Prieto Diaz, Christine Braun., “Computing similarity in a reuse library system: an AI-based approach”, ACM Trans. Softw. Eng. Methodol., vol.1, no.3, pp.205-228, 1992   DOI
2 Yoonhee Kim, Eun-kyung Kim, Beom-Jun Jeon, In-Young Ko, and Sung-Yong Park, “Wapee: A Fault-Tolerant Semantic Middleware in Ubiquitous Computing Environments”, EUC Workshops, IFIP International Federation for Information Processing, LNCS 4097, Seoul, August, 2006
3 Jang-uk In, Paul Avery, Richard Cavanaugh, Laukik Chitnis, Mandar Kulkarni, Sanjay Ranka, “SPHINX: A Fault-Tolerant System for Scheduling in Dynamic Grid Environments”, 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), p.12b, 2005
4 Condor/DAGMan http://www.cs.wisc.edu/condor/dagman
5 Zbigniew Kalbarczyk, Ravishankar K Iyer, Long Wang, “Application Fault Tolerance with Armor Middleware”, Internet Computing, Vol.9, No.2, pp.28-37, March/April, 2005   DOI   ScienceOn
6 P. Narasimhan, C. F. Reverte, S. Ratanotayanon and G. S. Hartman, “Middleware for Embedded Adaptive Dependability”, IEEE Workshop on Large Scale Real-Time and Embedded Systems, Austin, TX, December, 2002
7 R. Prieto-Diaz, P. Freeman, “Classifying Software for Reuse”, IEEE Software, 4(1), pp.6-16, 1987   DOI   ScienceOn
8 Hwayoun Lee, Ho-Jin Choi, In-Young Ko., “A Semantically-Based Software Component Selection Mechanism for Intelligent Service Robots”, Proceedings of 4th Mexican International Conference on Artificial Intelligence (MICAI2005), Monterrey, Mexico, November, 2005
9 Satish Tadepalli, Calvin Ribbens, Srinid Varadarahan, “GEMS: A Job Management System for Fault Tolerant Grid Computing”, High Peformance Computing Symposium, 2004
10 M. Weiser, “The computer for the 21st Century,” Scientific American, vol.265, no.3, pp.94-104, September, 1991