병렬 프로그램의 적응형 실행 기법

Adaptive Execution Techniques for Parallel Programs

  • 발행 : 2004.08.01

초록

본 논문은 병렬 프로그램을 실행할 때 계산량이 작은 병렬 루프를 병렬로 실행하는 경우에 생기는 프로그램의 성능 저하를 피하기 위하여, 컴파일 시나 실행 시에 성능 예측 모델을 이용하여 병렬 루프의 성능을 예측한 다음 적응형 실행 기법을 이용하여 병렬 프로그램을 실행하는 방법을 소개한다. 성능예측 알고리즘과 적응형 실행 알고리즘은 컴파일러 전처리기에 구현이 되었으며, 이 전처리기는 병렬 루프가 실행되는 방식을 컴파일 시나 실행 시에 결정하는 코드를 원래의 병렬 프로그램에 삽입한다. Fortran77로 씌어진 다섯 개의 대표적인 과학 수치계산 병렬 벤치마크 프로그램을 32개의 프로세서로 구성된 분산 공유 메모리 병렬 컴퓨터(SGI Origin2000)에 실행하여 본 논문에서 제안한 방법의 성능 평가를 하였을 때, 제안한 기법을 적응한 경우가 32, 16, 8, 및 4개의 프로세서에서 원래의 병렬 프로그램 보다 각각 26%, 20%, 16%, 및 10% 빨리 실행되었다. 이중 한 프로그램은 원래 병렬 프로그램 보다 32개 프로세서에서 두 배 이상 빠르게 실행되었다.

This paper presents adaptive execution techniques that determine whether parallelized loops are executed in parallel or sequentially in order to maximize performance. The adaptation and performance estimation algorithms are implemented in a compiler preprocessor. The preprocessor inserts code that automatically determines at compile-time or at run-time the way the parallelized loops are executed. Using a set of standard numerical applications written in Fortran77 and running them with our techniques on a distributed shared memory multiprocessor machine (SGI Origin2000), we obtain the performance of our techniques, on average, 26%, 20%, 16%, and 10% faster than the original parallel program on 32, 16, 8, and 4 processors, respectively. One of the applications runs even more than twice faster than its original parallel version on 32 processors.

키워드

참고문헌

  1. William Blume, Ramon Doallo, Rudolf Eigenmann, John Grout, Jay Hoeflinger, Thomas Lawrence, Jaejin Lee, David Padua, Yunheung Paek, Bill Pottenger, Lawrence Rauchwerger, and Peng Tu. Parallel programming with Polaris. IEEE Computer, 29(12):78-82, December 1996 https://doi.org/10.1109/2.546612
  2. Bowen Alpern et al. The Jalapeno Virtual Machine. IBM Systems journal, 39(1):211-238, Febrary 2000 https://doi.org/10.1147/sj.391.0211
  3. Pedro Diniz and Martin Rinard. Dynamic Feedback: An Effective Technique for Adaptive Computing. In Proceedings of the ACM SIGPLAN Conference on Program Language Design and Implementation, pages 71-84, June 1997 https://doi.org/10.1145/258916.258923
  4. Martin Rinard, and Pedro Diniz. Eliminating Synchronization Bottlenecks in Object Based Programs Using Adaptive Replication. In Proceedings of the ACM International Conference on Supercomputing (ICS), pages 83-92, June 1999 https://doi.org/10.1145/305138.305167
  5. Michael J. Voss and Rodolf Eigenmann. ADAPT: Automated De-Coupled Adaptive Program Transformation. In Proceedigns of the International Conference on Parallel Processing (ICPP), August 2000 https://doi.org/10.1109/ICPP.2000.876107
  6. Michael J. Voss and Rudolf Eigenmann. High-level Adaptive Program Optimization with ADAPT. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 93-102, June 2001 https://doi.org/10.1145/379539.379583
  7. OpenMP Standard Board. OpenMP Fortran Interpretations April 1999. Version 1.0
  8. R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, and R. Manon. Parallel Programming in OpenMP. AMorgan Kaufmann Publisher, 2001
  9. Silicon Graphics Inc. MIPSpro Auto-Parallelization Option Programmer's Guide
  10. Silicon Graphics Inc. MIPSpro Fortran 77 Programmer's Guide
  11. The National Center for Supercomputing Applications. http://www.ncsa.uiuc.edu
  12. Mark Byler, James Davies, Christopher Huson, Bruce Leasure, and Michael Wolfe. Multiple Version loops. In Proceedings of the International Conference on Parallel Processing (ICPP), pages 312-318, August 1987
  13. Alan L. Cox and Robert. J. Fowler. Adaptive Cache Coherency for Detecting Migratory Shared Data. In Proceedings of the 20th International Symposium on Computer Architecture, pages 98-108, May 1993 https://doi.org/10.1145/165123.165146
  14. Rajiv Gupta and Rastislav Bodik. Adaptive Loop Transformations for Scientific Programs. In Proceedings of the IEEE Symposium on Parallel and Distributed Processing, pages 368-375, October 1995 https://doi.org/10.1109/SPDP.1995.530707
  15. Theodore H. Romer, Dennis Lee, Brain N. Bershad, and Bradley Chen. Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware. In Proceedings of the 1st USENIX Symposium on Operating Systems Design and Implementation, pages 255-266, November 1994
  16. Rafael H. Saavedra and Daeyeon Park. Improving the Effectiveness of Software Prefetching with Adaptive Execution. In Proceedings of the Conference on Parallel Algorithms and Compilation Technique, October 1996
  17. Urs Holzle and David Ungar. Optimizing Dynamically-Dispatched Calls with Run-Time Type Feedback. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 326-336, June 1994 https://doi.org/10.1145/178243.178478
  18. Jaejin Lee, Compilation Techniques for Explicitly Parallel Programs. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, October 1999. Department of Computer Science Technical Report UIUCDCS-R-99-2112
  19. Jaejin Lee, Yan Solihin, and Josep Torrellas. Automatically Mapping Code in an Intelligent Memory Architecture. In Proceedings of the 7th International Symposium on High Performance Computer Architecture (HPCA), Pages 121-132, January 2001 https://doi.org/10.1109/HPCA.2001.903257