Browse > Article

An Implementation of Fault Tolerant Software Distributed Shared Memory with Remote Logging  

박소연 (한국과학기술원 전기전산학과)
김영재 (한국과학기술원 전기전산학)
맹승렬 (한국과학기술원 전기전산학과)
Abstract
Recently, Software DSMs continue to improve its performance and scalability As Software DSMs become attractive on larger clusters, the focus of attention is likely to move toward improving the reliability of a system. A popular approach to tolerate failures is message logging with checkpointing, and so many log-based rollback recovery schemes have been proposed. In this work, we propose a remote logging scheme which uses the volatile memory of a remote node assigned to each node. As our remote logging does not incur frequent disk accesses during failure-free execution, its logging overhead is not significant especially over high-speed communication network. The remote logging tolerates multiple failures if the backup nodes of failed nodes are alive. It makes the reliability of DSMs grow much higher. We have designed and implemented the FT-KDSM(Fault Tolerant KAIST DSM) with the remote logging and showed the logging overhead and the recovery time.
Keywords
Software Distributed Shared Memory; Fault Tolerance; Message Logging; Cluster System;
Citations & Related Records
연도 인용수 순위
  • Reference
1 G.Suri, B.Janssens, and W.K.Fuchs, 'Reduced Overhead Logging for Rollback Recovery in Distributed Shared Memory', In Proceedings of the 25th Annual International Symposium on Fault-Tolerant Computing, June 1995   DOI
2 F.Sultan, T.D.Nguyen, and L.Iftode, 'Scalable Fault-tolerant Distributed Shared Memory,' In Proceeding of Supercomputing, 2000   DOI
3 Y.Zhou, L.Iftode, and K.Li, 'Performance Evaluation of Two Home-based Lazy Release Consistency Protocols for Shared Virtual Memory Systems,' In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation, October 1996   DOI
4 S.Woo, M.Ohara, E.Torrie, J.Singh, and A.Gupta, 'The SPLASH-2 Programs: Characterization and Methodological Considerations,' In Proceedings of the 22nd International Symposium on Computer Architecture, May 1995   DOI
5 http://www.myri.com/scs/index.html
6 A.Kongmunvattana and M.F.Tzeng, 'Coherence Centric Logging and Recovery for Horne-based Software Distributed Shared Memory,' In Proceedings of the International Conference of Parallel Processing, September 1999   DOI
7 S.Kanthadai and J,L.Welch. 'Implementation of Recoverable Distributed Shared Memory by Logging Writes,' In Proceedings of the 16th International Conference on Distributed Computing Systems, May 1996   DOI
8 M.Costa, P.Guedes and M.Sequeira, N.Neves, and M.Castro, 'Lightweight Logging for Lazy Release Consistent Distributed Shared Memory,' In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation, October 1996   DOI
9 박소연, 김영재, 이상권, 맹승렬, 'VIA(Virtual Interface Architecture)를 기반으로 하는 소프트웨어 분산 공유메모리 시스템의 설계 및 구현', 제29회 한국정보과학회 춘계 학술발표논문집(A), 2002.4