DOI QR코드

DOI QR Code

A Study on Efficient Executions of MPI Parallel Programs in Memory-Centric Computer Architecture

  • Lee, Je-Man (Dept. of Computer Science, Sangmyung University) ;
  • Lee, Seung-Chul (Dept. of Computer Science, Sangmyung University) ;
  • Shin, Dongha (Dept. of Electronics, Sangmyung University)
  • 투고 : 2019.09.30
  • 심사 : 2019.11.25
  • 발행 : 2020.01.31

초록

본 논문에서는 프로세서 중심 컴퓨터 구조에서 개발된 MPI 병렬 프로그램을 수정하지 않고 메모리 중심 컴퓨터 구조에서 더 효율적으로 수행시키는 기술을 제안한다. 본 연구에서 제안하는 기술은 메모리 중심 컴퓨터 구조가 가지는 빠른 대용량 공유 메모리 특징을 이용하여 MPI 표준 라이브러리 함수가 수행하는 네트워크 통신을 통한 느린 데이터 전달을 공유 메모리를 통한 빠른 데이터 전달로 대체하여 효율성을 얻는다. 본 연구에서 제안한 기술은 두 개의 프로그램에 구현되었다. 첫 번째 프로그램은 MC-MPI-LIB라고 불리는 수정된 MPI 라이브러리인데 이는 기존 MPI 표준 라이브러리 함수의 의미를 유지하면서 메모리 중심 컴퓨터 구조에서 더 효율적으로 수행한다. 두 번째 프로그램은 MC-MPI-SIM이라고 불리는 시뮬레이션 프로그램인데 이는 프로세서 중심 컴퓨터 구조 상에서 메모리 중심 컴퓨터 구조의 수행을 시뮬레이션한다. 본 논문에서 제안한 기술은 도커 가상화 상에서 구현된 분산 시스템 환경에서 개발하고 시험하였다. 다수의 MPI 병렬 프로그램을 이용하여 제안한 기술의 성능을 측정한 결과 메모리 중심 컴퓨터 구조에서 더 높은 성능으로 수행 가능함을 보였으며, 특히 통신 오버헤드 비율이 높은 MPI 병렬 프로그램의 경우 매우 높은 성능으로 수행 가능하다는 점을 확인하였다.

In this paper, we present a technique that executes MPI parallel programs, that are developed on processor-centric computer architecture, more efficiently on memory-centric computer architecture without program modification. The technique we present here improves performance by replacing low-speed data communication over the network of MPI library functions with high-speed data communication using the property called fast large shared memory of memory-centric computer architecture. The technique we present in the paper is implemented in two programs. The first program is a modified MPI library called MC-MPI-LIB that runs MPI parallel programs more efficiently on memory-centric computer architecture preserving the semantics of MPI library functions. The second program is a simulation program called MC-MPI-SIM that simulates the performance of memory-centric computer architecture on processor-centric computer architecture. We developed and tested the programs on distributed systems environment deployed on Docker based virtualization. We analyzed the performance of several MPI parallel programs and showed that we achieved better performance on memory-centric computer architecture. Especially we could see very high performance on the MPI parallel programs with high communication overhead.

키워드

참고문헌

  1. P. Faraboschi, K. Keeton, T. Marsland, and D. Milojicic, "Beyond processor-centric operating systems", Proceedings of the 15th USENIX conference on Hot Topics in Operating Systems, pp. 17-17, Switzerland, May 2015.
  2. K. Keeton, "Memory-Driven Computing," Keynote at USENIX FAST, Santa Clara, CA, USA, Feb. 2017.
  3. K. M. Bresniker, S. Singhal, and R. S. Williams, "Adapting to thrive in a new economy of memory abundance," Computer, Vol. 48, No. 12, pp. 44-53, Dec. 2015. https://doi.org/10.1109/MC.2015.368
  4. D. Efnusheva, G. Dokoski, A. Tentov and M. Kalendar, "A Novel Memory-centric Architecture and Organization of Processors and Computres", International Conference on Applied Innovations in IT, Vol. 3, No. 1, pp. 47-53, Koethen, Germany, March 2015.
  5. A. Grama, A. Gupta, G. Karypis, and V. Kumar, "Introduction to Parallel Computing," Second Edition, Addison Wesley, 2003.
  6. W. Gropp, E. Lusk, and A. Skjellum, "Using MPI Portable Parallel Programming with the Message-Passing Interface," Third Edition, The MIT Press, 2014.
  7. MPI Forum, https://www.mpi-forum.org
  8. M. T. Chung, N. Quang-Hung, M. Nguyen, and N. Thoai, "Using Docker in High Performance Computing Applications", IEEE International Conference on Communications and Electronics, pp. 52-57, Ha Long, Vietnam, July 2016.
  9. Docker Documentation, https://docs.docker.com
  10. K. Matthias and S. P. Kane, "Docker: Up and Running," O'Reilly, 2015.
  11. G. Hager, and G. Wellein, "Introduction to High Performance Computing for Scientists and Engineers," CRC Press, 2010.
  12. K. Takeuchi, "Memory system architecture for the data centric computing", Japanese Journal of Applied Physics, Vol. 55, No. 4S, pp. 04EA02, Feb. 2016. https://doi.org/10.7567/JJAP.55.04EA02
  13. J. McCalpin, "Memory Bandwidth and System Balance in HPC Systems", Salt Lake City, Utah, USA, Invited talk at ACM/IEEE Supercomputing Conference, Dec. 2016.
  14. The Machine, https://www.labs.hpe.com/memory-driven-computing
  15. A. Degomme, A. Legrand, G. S. Markomanolis, M. Quinson, M. Stillwell, and F. Suter, "Simulating MPI Applications: The SMPI Approach," IEEE Transactions on Parallel and Distributed Systems, Vol. 28, No. 8, pp. 2387-2400, Aug. 2017. https://doi.org/10.1109/TPDS.2017.2669305
  16. T. Hoefler, T. Schneider, and A. Lumsdaine, "LogGOPSim-Simulating Large-Scale Applications in the LogGOPS Model", Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, USA, pp. 597-604, June 2010.
  17. C. Janssen, H. Adalsteinsson, S. Cranford, J. Kenny, A. Pinar, D. Evensky and J. Mayo, "A Simulator for Large-scale Parallel Computer Architectures," International Journal of Distributed Systems and Technologies, Vol. 1, No. 2, pp. 57-73, April 2010. https://doi.org/10.4018/jdst.2010040104
  18. Je-Man Lee, Seung-Chul Lee, and Dong-Ha Shin, "Efficient Executions of MPI Parallel Programs in Memory-Centric Computer Architecture," Proceedings of The Korea Society of Computer and Information Summer Conference, Vol. 27, No. 2, pp. 257-258, Jeju, South Korea, July 2019.