Improving Fault Tolerance for High-capacity Shared Distributed File Systems using the Rotational Lease Under Network Partitioning

대용량 공유 분산 화일 시스템에서 망 분할 시 순환 리스를 사용한 고장 감내성 향상

  • 탁병철 (한국전자통신연구원 Embeded S/W 연구단) ;
  • 정연돈 (동국대학교 컴퓨터공학과) ;
  • 김명호 (한국과학기술원 전산학과)
  • Published : 2005.12.01

Abstract

In the shared storage file system, systems can directly access the shared storage device through specialized data-only subnetwork unlike in the network attached file server system. In this shared-storage architecture, data consistency is maintained by some designated set of lock servers which use control network to send and receive the lock information. Furthermore, lease mechanism is introduced to cope with the control network failure. But when the control network is partitioned, participating systems can no longer make progress after the lease term expires until the network recovers. This paper addresses this limitation and proposes a method that allows partitioned systems to make progress under the partition of control network. The proposed method works in a manner that each participating system is rotationally given a predefined lease term periodically. It is also shown that the proposed mechanism always preserves data consistency.

서버를 통하여 저장 장치를 사용하는 네트워크 연결형 화일 시스템과 달리, 대용량 공유 저장 장치 화일 시스템에서는 서버들이 데이타 전용망을 통하여 저장 장치를 직접 공유하여 사용한다. 이런 구조에서는 데이타의 일관성을 유지하기 위하여 잠금 관리자가 존재하여 제어망을 통하여 잠금 정보를 주고 받는다. 또한 예기치 않은 제어망의 고장에 대비하여 리스를 사용한다. 하지만 제어망에 분할 고장이 발생할 경우 격리된 서버들은 고장이 해결되기 전까지는 더 이상 작업을 진행할 수 없게 된다. 본 논문에서는 이러한 제어망 분할 고장이 발생한 상황에서도 서버들이 계속 화일 시스템을 사용하여 작업을 진행할 수 있도록 하는 기법을 제안한다. 제안하는 기법은 주기적으로 각 서버들에게 리스를 순환하여 할당하는 방식으로 동작한다. 또한 제안하는 기법은 항상 데이타의 일관성을 유지함을 보인다.

Keywords

References

  1. Steven R. Soltis, Thomas M. Ruwart, Matthew T.O'Keefe, The Global File System, Proceedings of the Fifth NASA Goddard Space Flight Center Conference on Mass Storage Systems and Technologies, Sept 17-19, 1996
  2. Matthew T. O'Keefe, Shared File Systems and Fibre Channel, Sixth NASA Goddard Space Flight Center Conference on Mass Storage and Technologies in cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems March 23-26, 1998
  3. R. C. Burns, R. M. Rees, and D. D. E. Long, Safe Caching in a Distributed File System for Network Attached Storage, In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2000 https://doi.org/10.1109/IPDPS.2000.845977
  4. Chang-Soo Kim, gyoung-Bae Kim, Bum-Joo Shin, 'Volume Management for SAN environment,' In Proceedings of the International Conference on Parallel and Distributed Systems, 2001 https://doi.org/10.1109/ICPADS.2001.934859
  5. R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon, 'Design and Implementation of the Sun Network Filesystem,' Proceedings of the Summer 1985 USENIX Conference, pages 119-130 June 1985
  6. J. H. Howard, M. L. Kazar, S. G. Menees, D. A. Nichols, M. Satyanarayanan, R. N. Sidebotham, and Michael J. West, 'Scale and Performance in a Distributed File System,' ACM Transactions on Computer Systems, 6(1), pages 51-81, February 1988 https://doi.org/10.1145/35037.35059
  7. T. E. Anderson, M. D. Dahlin, J. M. Neefe, D. A. Patterson, D. S. Roselli and R. Y. Wang, 'Serverless network file systems,' In Proceedings of the 15th Symposium on Operating Systems Principles, pages 109-126, December 1995 https://doi.org/10.1145/35037.42183
  8. T. E. Anderson, M. D. Dahlin, J. M. Neefe, D. A. Patterson, D. S. Roselli and R. Y. Wang, 'Serverless network file systems,' In Proceedings of the 15th Symposium on Operating Systems Principles, pages 109-126, December 1995 https://doi.org/10.1145/35037.42183
  9. C.A. Thekkath, T. Mann, and E.K. Lee. 'Frangipani: A Scalable Distributed File System,' Proceedings of the ACM Symposium on Operating Systems Principles, pp. 224-237, Dec. 1997
  10. R. C. Burns, R. M. Rees, and D. D. E. Long. An analytical study of opportunistic lease renewal. In Proceedings of the 16th International Conference on Distributed Computing Systems, 2001 https://doi.org/10.1109/ICDSC.2001.918943
  11. R. C. Burns, R. M. Rees, and D. D. E. Long. Semi-Preemptible Locks for a Distributed File System. In Proceedings of the 2000 International Performance Computing and Communication Conference (IPCCC), IEEE, 2000 https://doi.org/10.1109/PCCC.2000.830343
  12. Frank Schmuck and Roger Haskin. GPFS: A Shared-Disk File System for Large Computing Clusters. Proceedings of the Conference on File and Storage Technologies (FAST'02), pp. 231-244, 2002
  13. CXFS: A high-performance, multi-OS SAN file system from SGI. SGI White Paper. URL http://www.sgi.com/products/storage/tech/file_systems.html
  14. C. Gray and D. Cheriton, 'Lease: An efficient fault-tolerant mechanism for distributed file cache consistency,' Twelfth ACM Symposium on Operating Systems Principles, pp. 202 210, 1989 https://doi.org/10.1145/74851.74870
  15. J. Yin, L. Alvisi, M. Dahlin, and C. Lin. Using leases to support server-driven consistency in large scale systems. In Proc. of the 18th IntI. Conf. on Distributed Computing Systems, May 1998 https://doi.org/10.1109/ICDCS.1998.679726
  16. George Coulouris, Jean Dollimore, Tim Kindberg, Distributed Systems Concepts and Design Third edition, Addison-Wesley 2001
  17. Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Comleteness. W.H.Freeman, 1979