GEN_BLOCK간 재분산을 위한 통신 스케줄

Communication Schedule for GEN_BLOCK Redistribution

  • 발행 : 2000.05.15

초록

배열 재분산은 분산 메모리 컴퓨팅 환경에서 응용 프로그램의 수행 속도를 빠르게 하기 위해 많이 사용되고 있다. 특히 GEN BLOCK간 재분산은 동적으로 부하가 변화하는 경우 최적화된 성능을 보이기 위해 필요하다. 배열 재분산에 관한 기존 연구들은 대부분 CYCLIC(N)등과 같은 정규 분산 패턴간 재분산에 대해서만 이루어져 왔다. 그러나 GEN BLOCK등과 같은 비정규 분산 패턴간 재분산에서 발생하는 메시지패싱들은 정규 분산 패턴간 재분산과는 다른 특정을 보이기 때문에 이에 대한 새로운 연구가 필요하다. 본 논문은 GEN BLOCK간 재분산에서 발생하는 메시지패싱들에 정규 분산 패턴간 재분산에서 발견되 는 규칙성은 없는 반면 공간 지역성 (spacial locality)이 존재함을 보이고, 이를 기반으로 최소 스텝 정리와 최소 크기 정리가 재분산의 성능을 향상시키는데 중요함을 증병하였으며, 기존의 리스트 스케줄링 방식에 재구성 단계(relocation phase)를 추가함으로써 최적 스케줄을 생성하는 알고리즘을 제시하였다. 마지막으로 제안한 알고리즘의 성능을 평가하기 위해 , CRAY T3E와 IBM SP2에서 성능 평가를 수행 하였으며, 그 결과 분산 메모리 병렬 머신에서 최소 스텝 정리와 최소 크기 정리를 만족하는 스케줄이 GEN BLOCK간 재분산의 성능 향상에 중요함을 보였다.

Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. GEN_BLOCK redistribution, which is redistribution between different GEN_BLOCKs, is essential for load balancing. However, prior research on redistribution has been focused on regular redistribution, such as redistribution between different CYCLIC(N)s. GEN_BLOCK redistribution is very different from regular redistribution. Message passing in regular redistribution involves repetitions of basic message passing patterns, while message passing for GEN_BLOCK redistribution shows locality. This paper proves that two optimal condition, reducing the number of communication steps and minimizing redistribution size, are essential in GEN_BLOCK redistribution. Additionally, by adding a relocation phase to list scheduling, we make an optimal scheduling algorithm for GEN_BLOCK redistribution. To evaluate the performance of the algorithm, we have performed experiments on a CRAY T3E. According to the experiments, it was proven that the scheduling algorithm shows better performance and that the conditions are critical in enhancing the communication speed of GEN_BLOCK redistribution.

키워드

참고문헌

  1. High Performance Fortran Forum, High Performance Fortran Language Specification version 2.0, Rice University, Houston, Texas, October 1996
  2. Yeh-Ching Chung, Ching-Hsien Hsu, and Sheng-Wen Bai, 'A Basic-Cyclic Calculation Technique for Efficient Dynamic Data Redistribution,' IEEE Transaction on Parallel and Distributed Systems, Vol.9, No.4, April 1998 https://doi.org/10.1109/71.667897
  3. Young Won Lim, Prashanth B. Bhat, and Viktor K. Prasanna, 'Efficient Algorithm for Block-Cyclic Redistribution of Arrays,' IEEE Symposium on Parallel and Distributed Process, October 1996 and will be published in Algorithmica https://doi.org/10.1109/SPDP.1996.570319
  4. Frederic Desprez, Jack Dongarra, Antoine Petitet, Cyril Randriamaro, and Yves Robert, 'Scheduling Block-Cyclic Array Redistribution,' CRPC-TR97714-S, February 1997
  5. G. F. Pfister and V. A. Norton, 'Hot spot contention and combining in multistage interconnection networks,' IEEE Transaction on Computers, vol. 34, pp. 943-948, Oct. 1985
  6. Rajeev Thakur, Alok Choudhary, and Geoffrey Fox, 'Runtime Array Redistribution in HPF Programs,' Proceedings of SHPCC'94, pp.309-316, 1994 https://doi.org/10.1109/SHPCC.1994.296659
  7. David W. Walker, Steve W. Otto, 'Redistribution of Block-Cyclic Data Distribution Using MPI,' Concurrency: Practice and Experience, Vol.8 No.9, pp.707-728, November 1996 https://doi.org/10.1002/(SICI)1096-9128(199611)8:9<707::AID-CPE269>3.0.CO;2-V
  8. Rajeev Thakur, Alok Choudhary, and J. Ramanujam, 'Efficient Algorithms for Array Redistribution,' IEEE Transactions on Parallel and Distributed Systems, Vol.7 No.6, June 1996 https://doi.org/10.1109/71.506697
  9. S.D. Kaushik, C.-H. Huang, J. Ramanujam, and P. Sadayappan, 'Multi-Phase Array Redistribution: Modeling and Evaluation,' Proceedings of 9th International Parallel Processing Symposium, pp.441-445, April 1995 https://doi.org/10.1109/IPPS.1995.395968
  10. Jose Duato, Sudhakar Yalamanchili, and Lionel Ni, Interconnection Networks, IEEE Computer Society Press, pp 155, 1997
  11. James M. Stichnoth, David O'Hallaron, and Thomas R. Gross, 'Generating Communication for Array Statements: Design, Implementation, and Evaluation,' Journal of Parallel Distributed Computing, pp.150-159, April, 1994 https://doi.org/10.1006/jpdc.1994.1048
  12. Edger T. Kalns and Lionel M. Ni, 'Processor Mapping Techniques Toward Efficient Data Redistribution,' Proceedings of the 8th International Parallel Processing Symposium, April 26-29, 1994, Cancun, Maxico https://doi.org/10.1109/IPPS.1994.288261
  13. S.K.S Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan, 'Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines', Technical Report OSU-CISRC-4