A Communication and Computation Overlapping Model through Loop Sub-partitioning and Dynamic Scheduling in Data Parallel Programs

데이타 병렬 프로그램에서 루프 세부 분할 및 동적 스케쥴링을 통한 통신과 계산의 중첩 모델

  • Published : 2000.01.15

Abstract

We propose a model which overlaps communication with computation for efficient communication in the data-parallel programming paradigm. The overlapping model divides a given loop partition into several sub-partitions to obtain computation which can be overlapped with communication. A loop partition sometimes refers to other data partitions, but not all iterations in the loop partition require non-local data. So, a loop partition may be divided into a set of loop iterations which require non-local data, and a set of loop iterations which do not. Each loop sub-partition is dynamically scheduled depending on associated message arrival, The experimental results for a few benchmarks in IBM SP2 show enhanced performance in our overlapping model.

본 논문은 데이타 병렬 프로그램에서 효율적 통신을 위한 하나의 방법으로 통신과 계산 작업을 중첩하여 실행하는 모델을 제안한다. 이 중첩 모델에서는 통신 지연 시간 동안 중첩하여 수행할 계산 작업을 얻기 위해 주어진 루프 분할을 다시 세부 분할한다. 주어진 루프 분할은 다른 외부 데이타 분할을 참조하기도 하지만, 루프 분할의 모든 반복들이 항상 외부 데이타 참조를 필요로 하는 것은 아니다. 따라서 주어진 루프 분할을 외부 데이타를 요구하는 루프 반복들의 집합과 그렇지 않은 루프 반복들의 집합으로 나눌 수 있다. 이렇게 나누어진 루프 세부 분할은 효율적인 수행을 위해 메시지 도착 순서에 따라 동적으로 스케쥴링된다. 제안된 방법에 따라 IBM SP2에서 몇가지 프로그램으로 실험을 한 결과, 중첩 모델이 성능 향상을 보임을 확인할 수 있었다.

Keywords

References

  1. David E. Culler, Andrea Arpaci-Dusseau et al., Parallel Computing on the Berkeley NOW, In Proc. of 9th Joint Symposium on Parallel Processing, Kobe, Japan, 1997
  2. S. Hiranandani, K. Kennedy and C. Tseng, 'Compiling Fortran D for MIMD Distributed-Memory Machines,' Communications of the ACM, Vol. 35, No. 8, pp. 66-80, Aug 1992 https://doi.org/10.1145/135226.135230
  3. Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, and S. Ranka, Fortran 90D/HPF Compiler for Distributed Memory MIMD Computers, Design, Implementation, and Performance Results, In Proc. of the 7th ACM Intl Conference on Supercomputing, pp. 351-360, July 1993 https://doi.org/10.1145/169627.169750
  4. Zima et al., Compiling for Distributed-Memory Systems, Invited paper, In Proc. of the IEEE Special Section on Languages and Compilers for Parallel Machines, pp.264-287, Feb. 1993 https://doi.org/10.1109/5.214550
  5. High Performance Fortran Forum, High Performance Fortran Language Specification Version 2.0, The Center for Research on Parallel Computation, Jan. 1997
  6. S. Hiranandani, K. Kennedy and C. Tseng, 'Compiler Optimization for Fortran D on MIMD Distributed- Memory Machines,' In Proc. of Supercomputing '91, Nov. 1991 https://doi.org/10.1145/125826.125886
  7. V. Balasundaram, G. Fox, K. Kennedy and U. Kremer, 'An Interactive Environment for Data Partitioning and Distribution,' In Proc. of the 5th Distribution Memory Computing Conference, April 1990
  8. M. Gerndt, 'Updating Distributed Variables in Local Computations,' Concurrency: Practice & Experience, Vol. 2, No. 3, pp. 171-193, Sept. 1990 https://doi.org/10.1002/cpe.4330020303
  9. S. Bokhari, 'Complete Exchange on the iPSC-860,' ICASE Report 91-4, Institute for Computer Application in Science and Engineering, Jan 1991
  10. J. Li and M. Chen, 'Compiling Communication-efficient Programs for Massively Parallel Machines,' IEEE Trans. on Parallel and Distributed Systems, Vol. 2, No. 3, pp. 361-376, July 1991 https://doi.org/10.1109/71.86111
  11. Zhiwei Xu and Kai Hwang, 'Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2,' IEEE Parallel and Distributed Technology, Vol. 4, No. 1, pp. 9-23, Spring 1996 https://doi.org/10.1109/88.481662
  12. A. Rogers and K. Pingali, 'Process Decomposition through Locality of Reference,' In Proc. of the SIGPLAN '89 Conf, on Programming Language Design and Implementation, June 1989 https://doi.org/10.1145/73141.74824
  13. C. Koelbel and P. Mehrota, 'Programming Data Parallel Algorithms on Distributed Memory Machines Using Kali,' In Proc. of the 1991 ACM Int'l Conf. on Supercomputing, June 1991 https://doi.org/10.1145/109025.109122
  14. T. von Eicken, D. E. Culler, S. C. Goldstein and K. E. Schauser, 'Active Messages: A Mechanism for Integrated Communication and Computation,' In Proc. of the 19th Int'I Symposium on Computer Architecture, Gold Coast, Australia, May 1992 https://doi.org/10.1145/139669.140382
  15. C. B. Stunkel et aI., 'The SP2 High-Performance Switch,' IBM Systems Journal, Vol. 34, No. 2, 1995
  16. Chi-Chao Chang, Grzegorz Czajkowski and Thorsten von Eicken, 'Design and Performance of Active Messages on the IBM SP2,' Cornell CS Tech. Report 96-1572, Feb. 1996
  17. Chi-Chao Chang, Grzegorz Czajkowski, Chris Hawblitzel and Thorsten von Eicken, 'Low-Latency Communication on the IBM RISC System/6000 SP,' In Proc. of ACM/IEEE Supercomputing, Pittsburgh, PA, Nov. 1996 https://doi.org/10.1145/369028.369079
  18. Gautam Shah et al., 'Performance and Experience with LAPI - a New High-Performance Communication Library for the IBM RS/6000 SP,' In Proc. of Int'l Parallel Processing Symposium, 1998 https://doi.org/10.1109/IPPS.1998.669923
  19. F. H. McMahon, 'The Livermore Fortran Kernels: a Computer Test of the Numerical Performance Range,' Lawrence Livermore National Laboratory, UCRL-53745, UC Livermore, 1986