잠금 해제 지연 일관성 모델을 기반으로 하는 분산 공유 메모리 시스템에서의 효과적인 로깅기법

An Efficient Logging Scheme based on Lazy Release Consistent Model for Distributed Shared Memory System

  • 발행 : 2000.02.15

초록

본 논문은 잠금 해제 지연 메모리 모델을 기반으로 하는 분산 공유 메모리 시스템을 위한 효과적이고 안전한 로깅 기법을 제안한다. 제안된 기법에서는 프로세스들 사이의 종속 관계가 추적되어, 실제로 종속 관계가 발생하는 경우에만 안전한 로깅을 수행하는데, 이는 프로세스들 사이에 정보 전달이 일어나는 경우 무조건 로깅을 수행하던 기존의 방법들과 비교 해볼 때, 로깅 횟수가 크게 줄어드는 효과를 낸다. 더욱이. 제안된 기법에서는 각 프로세스가 사용한 데이타를 모두 안전한 저장장소에 로깅 하는 대신, 필요한 데이타들은 그 데이타를 생성한 프로세스의 휘발성 메모리에 로깅하고, 그 데이타의 사용 정보만을 로깅 한다. 프로세스내의 결함 발생 후, 복구 과정에서, 각 프로세스는 로깅 된 사용 정보만을 이용하여 알맞은 버전의 데이타를 효과적으로 찾을 수 있다. 결과적으로 각 프로세스에서 저장되는 로그의 양 또한 줄어든다.

This paper presents an efficient stable logging scheme for the distributed shared memory system based on the lazy release consistent memory model. In the proposed scheme, inter-process dependency is traced and stable logging is performed when the dependency relation between processes actually happens. With the dependency tracking, the proposed scheme requires much less frequency of stable logging, comparing with the previous schemes in which stable logging is performed whenever any information transfer happens between processes. Also, in the proposed scheme, every data item accessed by a process is not logged, but only the access information is logged in the stable storage. For the recovery from a failure, the correct version of the accessed data items can be effectively traced by using the logged access information. As a result, the amount of logged information is also reduced.

키워드

참고문헌

  1. R.E. Ahmed, R.C. Frazier and P.N. Marinos. Cache-aided Rollback Error Recovery(carer)Algorithms for Shared-memory Multiprocessor Systems. In Proc. of the 20th Symp. on Fault-Tolerant Computing, pp. 82--88, Jun. 1990 https://doi.org/10.1109/FTCS.1990.89338
  2. S.V. Adve and M.D. Hill. Weak Ordering -- A New Definition. In Proc. of the 17th Annual Int'l Symp. on Computer Architecture, pp. 2--14, May 1990 https://doi.org/10.1109/ISCA.1990.134502
  3. G. Cabillic, T. Priol and I. Puaut. The Performance of Consistent Checkpointing in Distributed Shared Memory Systems. In Proc. of the 14th Symp. on Reliable Distributed Systems, pp. 95--105, Sep. 1995
  4. M. Chandy and L. Lamport. Distributed Snapshot: Determining Global States of Distributed Systems. ACM Trans. on Computer Systems, Vol. 3, No. 1, pp. 63--75, Feb. 1985 https://doi.org/10.1145/214451.214456
  5. M. Costa, P. Guedes, M. Sequeira, N. Neves, and M. Castro. Lightweight Logging for Lazy Release Consistent Distributed Shared Memory. In Proc. of the USENIX 2nd Symp. on Operating Systems Design and Implementation, Oct. 1996 https://doi.org/10.1145/238721.238762
  6. G. Janakiraman and Y. Tamir. Coordinated Checkpointing Rollback Error Recovery for Distributed Shared Memory Multicomputers. In Proc. of the 13th Symp. on Reliable Distributed Systems, pp. 42--51, Oct. 1994 https://doi.org/10.1109/RELDIS.1994.336910
  7. B. Janssens and W.K. Fuchs. Relaxing Consistency in Recoverable Distributed Shared Memory. In Proc. of the 23rd Annual Int'l Symp. on Fault-Tolerant Computing, pp. 155--163, Jun. 1993
  8. S. Kanthadai and J.L. Welch. Implementation of Recoverable Distributed Shared Memory by Logging Writes. In Proc. of the 16th Int'l Conf. on Distributed Computing Systems, pp. 116--123, 1996
  9. P. Keleher, A. L. Cox, and W. Zwaenepoel. Lazy Release Consistency for Software Distributed Shared Memory. In Proc. of the 18th Annual Int'l Symp. on Computer Architecture, pp. 13--21, May 1992 https://doi.org/10.1145/146628.139676
  10. A. Kermarrec, G. Cabillic, A. Gefflaut, C. Morin, and I. Puaut. A Recoverable Distributed Shared Memory Integrating Coherence and Recoverability. In Proc. of the 25th Int'l Symp. on Fault-Tolerant Computing Systems, pp. 289--298, Jun. 1995
  11. K. Li. Shared Virtual Memory on Loosely Coupled Multiprocessors. PhD thesis, Department of Computer Science, Yale University, Sep. 1986
  12. N. Neves, M. Castro, and P. Guedes. A Checkpoint Protocol for an Entry Consistent Shared Memory System. In Proc. of the 13th Annual ACM Symp. on Principles of Distributed Computing, Aug. 1994 https://doi.org/10.1145/197917.197973
  13. T. Park, S.B. Cho, and H.Y. Yeom. An Improved Logging and Checkpointing Scheme for Recoverable Distributed Shared Memory. In Proc. of the 2nd Asian Computing Science Conference, pp. 74--83, Dec. 1996
  14. T. Park, S.B. Cho, and H.Y. Yeom. An Efficient Logging Scheme for Recoverable Distributed Shared Memory Systems. In Proc. of the 17th Int'l Conf. istributed Computing Systems, May 1997
  15. B. Randell, P.A. Lee and P.C. Treleaven. Reliability Issues in Computing System Design. ACM Computing Surveys, Vol. 10, No. 2, pp. 123--165, Jun. 1978 https://doi.org/10.1145/356725.356729
  16. G. G. Richard III and M. Singhal. Using Logging and Asynchronous Checkpointing to Implement Recoverable Distributed Shared Memory. In Proc. of the 12th Symp. on Reliable Distributed Systems, pp. 58--67, Oct. 1993 https://doi.org/10.1109/RELDIS.1993.393473
  17. R.D. Schlichting and F.B. Schneider. Fail-stop Processors: An Approach to Designing Faulttolerant Computing Systems. ACM Trans. on Computer Systems, Vol. 1, No. 3, pp. 222--238, Aug. 1983 https://doi.org/10.1145/357369.357371
  18. R. Schwarz and F. Mattern. Detecting Causal Relationships in Distributed Computations: InSearch of the Holy Grail. Technical Report TR #SFB124-1592, Department of Computer Science, University of Kaiserslautern, 1992
  19. G. Suri, B. Janssens, and W. K. Fuchs. Reduced Overhead Logging for Rollback Recovery in Distributed Shared Memory. In Proc. of the 25th Annual Int'l Symp. on Fault-Tolerant Computing, Jun. 1995 https://doi.org/10.1109/FTCS.1995.466971
  20. K.-L. Wu and W. K. Fuchs. Recoverable Distributed Shared Memory. IEEE Transactions on Computers, Vol. 39, No. 4, pp. 460--469, Apr. 1990 https://doi.org/10.1109/12.54839