Synchronization of Network Interfaces in System Area Networks

시스템 에어리어 네트?에서의 동기화 기법

  • 송효정 (삼성전자 기술총괄 책임)
  • Published : 2005.06.01

Abstract

Many applications in cluster computing require QoS (Quality of Service) services. Since performance predictability is essential to provide QoS service, underlying systems must provide predictable performance guarantees. One way to ensure such guarantees from network subsystems is to generate global schedules from applications'network requests and to execute the local portion of the schedules at each network interface. To ensure accurate execution of the schedules, it is essential that a global time base must be maintained by local clocks at each network interface. The task of providing a single time base is called a synchronization problem and this paper addresses the problem for system area networks. To solve the synchronization problem, FM-QoS (1) proposed a simple synchronization mechanism called FBS(Feedback-Based Synchronization) which uses built-in How control signals. This paper extends the basic notion of FM-QoS to a theoretical framework and generalizes it: 1) to identify a set of built-in network flow control signals for synchrony and to formalize it as a synchronizing schedule, and 2) to analyze the synchronization precision of FBS in terms of flow control parameters. Based on generalization, two application classes are studied for a single switch network and a multiple switch network. For each class, a synchroniring schedule is proposed and its bounded skew is analyzed. Unlike FM-QoS, the synchronizing schedule is proven to minimize the bounded skew value for a single switch network. To understand the analysis results in practical networks, skew values are obtained with flow control parameters of Myrinet-1280/SAN. We observed that the maximum bounded skew of FBS is 9.2 Usec or less over all our experiments. Based on this result, we came to a conclusion that FBS was a feasible synchronization mechanism in system area networks.

많은 Application이 QoS(Quality of Service)를 요구한다. 본 논문에서는 시스템 에어리어 네트?에서 QoS의 일종인 성능 예측성에 촛점을 맞춘다. 예측성을 만족시키기 위해 네트?상의 모든 통신 스트림이 정적으로 스케줄된다고 가정할 때, 각 네트웍 인터페이스는 정해진 스케줄을 제 때에 실행하여야 하며 이를 위해서는 각 인터페이스의 시간베이스가 모두 동기화 되어야 한다. 본 논문에서는 이 동기화 문제를 다루었다. 동기화 문제를 해결하기 위해서, 본 논문에서는 링크 레벨 흐름 제어 신호를 이용해서 기 제안된 FM-QoS(1) 기법을 확장하였다. FM-QoS는 네트웍이 하나의 스위치로만 구성될 때 가능한 동기화 기법이다. 본 논문에서는 임의의 망구조에서 1) 네트? 인터페이스들이 동기화되기 위한 흐름 제어신호의 조건 (동기화 스케줄로 칭함)을 규정하고, 2) 동기화 스케줄의 정확도를 분석하였다. 제안한 방식의 실례를 보이기 위해서, 하나의 스위치로 구성된 네트월과 여러 개의 스위치로 구성된 트리 구조 네트웍에서 각기 동기화 스케줄을 예시하고 이의 정확도를 수치적으로 분석하였다. 분석된 정확도가 실제 시스템에서 어느 정도의 값을 갖는지를 이해하기 위해서, Myrinet 스위치(2)로 구성된 망에 대해서 실험하였다. 실험 중 관찰된 최대 정확도의 에러는 9.2 마이크로 세컨드이며, 이 수치를 바탕으로 본 논문은 제안한 동기화 알고리즘이 시스템 에어리어 네트웍에서 효과적이라고 결론지었다.

Keywords

References

  1. K. Connelly. FM -QoS : Real-time communications using self-synchronizing schedules. In Proceedings of SC'97, November 1997
  2. Myrinet on VME protocol specification standard. VITA 26-199x Draft 1.1, August 1998. VITA Standards Organization
  3. A. Chien et al. Design and evaluation of an HPVM-based Windows NT supercomputer. The International Journal of HIGH PERFORMANCE COMPUTING APPLICATIONS, 13(3):201-219, Fall 1999 https://doi.org/10.1177/109434209901300304
  4. R. Horst. TNet : A reliable system area network for I/O and IPC. In Proceedings of the IEEE Symposium on Hot Interconnects, 1994
  5. I. Foster et al. A distributed resource management architecture that supports advance reservations and co-allocation. In Proceedings of the International Workshop on Quality of Service, 1999 https://doi.org/10.1109/IWQOS.1999.766475
  6. Jon C.R. Benett and H. Zhang, Hierarchical packet fair queuing algorithms. In Proceedings of ACM SIGCOMM, August 1996 https://doi.org/10.1145/248156.248170
  7. N. Vasanthavada and P. NN. Marinos. Synchronization of fault-tolerant clocks in the presence of malicious faults. IEEE Transactions on Computers, 37(4):440-448, 1998 https://doi.org/10.1109/12.2188
  8. F.Schmuck and F.Cristian. Continuous clock amortization need not affect the precision of a clock synchronization algorithm. In Proceedings of the ACM Symposium on Principles of Distributed Computing, pages 133-143, 1990 https://doi.org/10.1145/93385.93411
  9. A. Chien and Others. HPVM software distributions, 1999. Available from http://www-csag.ucsd.edu/projects/hpvm/sw-distributions/index.html
  10. J.H. Kim. Bandwidth and Latency Guarantees in Low-cost, High Performance Networks. PhD thesis, University of Illinois at Urbana-Champaign, January 1997
  11. E. Horowitz and S.Sahni. Fundamental of Computer Algorithms. Computer Science Press, 1978
  12. P. F. Reynolds, C.Williams Jr., and Jr. R.R. Wagner. Isotach networks. IEEE Transactions on Parallel and Distributed Systems, 8(4), 1997 https://doi.org/10.1109/71.588601