Browse > Article
http://dx.doi.org/10.7840/KICS.2011.36B.11.1329

The Design of MPI Hardware Unit for Enhanced Broadcast Communication  

Yun, Hee-Jun (연세대학교 전기전자공학과 프로세서 연구실)
Chung, Won-Young (연세대학교 전기전자공학과 프로세서 연구실)
Lee, Yong-Surk (연세대학교 전기전자공학과 프로세서 연구실)
Abstract
This paper proposes an algorithm and hardware architecture for a broadcast communication which has the worst bottleneck among multiprocessor using distributed memory architectures. In conventional systems, collective communication is converted into point-to-point communications by MPI library cell without considering the state of communication port of each processing node which represents the processing node is in busy state or free state. If conflicting point-to-point communication occurs during broadcast communication, the transmitting speed for broadcast communication is decreased. Thus, this paper proposed an algorithm which determines the order of point-to-point communications for broadcast communication according to the state of each processing node. According to the state of each processing node, the proposed algorithm decreases total broadcast communication time by transmitting message preferentially to the processing node with communication port in free state. The proposed MPI unit for broadcast communication is evaluated by modeling it with systemC. In addition, it achieved a highly improved performance for broadcast communication up to 78% with 16 nodes. This result shows the proposed algorithm is useful to improving total performance of MPSoC.
Keywords
MPSoC; Message passing; Distributed memory; Broadcast; Collective operation;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 A. C. K1aiber, H. M. Levy, "A comparison of message passing and shared memory architectures for data parallel programs," Proceedings of the 21st annual international symposium on Computer architecture, Vol 22, pp 94-105, April 1994   DOI   ScienceOn
2 P. Stenstrom, "A Survey of Cache Coherence Schemes for Multiprocessors," Computer, Vol.23, pp. 12-24, June 1990.
3 L. Benini and G.de Micheli, " Networks On Chip: A New SoC Paradigm," IEEE Computer, Vol 35, No. 1, Jan. 2002, pp. 70-78 .   DOI   ScienceOn
4 Daniel L. Ly, Manuel Saldana, Paul Chow, "the Challenges of Using An Embedded MPI for Hardware-based Processing Nodes," Field-Programmable Technology(FPT) 2009, Sydney, NSW, Dec. 2009, pp. 120-127.
5 T. P. McMahon and A. Skjellum, "eMPI/eMPICH: Embedding MPI," MPI Developers Conference, 1996, pp. 180-184 .
6 R. Rabenseifner, "Automatic MPI counter profiling of all users: First results on a CRA Y T3E 900-512," Proceedings of the Message Passing Interface Developer's and User's Conference 1999(MPIDC99), 1999, pp.77-85.
7 S. S. Vadhiyar, G. E. Fagg, and J. Dongarra. "Automatically Tuned Collective Communications," In Proceedings of SC'00: High Performance Networking and Computing, 2000.
8 Mike Barnett, Satya Gupta, David G. Payne, Lance Shuler, and Robert van de Geijn, "Building a High-Performance Collective Communication Library," Supercomputing '94, Nov. 1994, pp. 107-116.
9 Thakur, Rajeev, et aI., "Optimization of collective communication operations in mpich," International Journal of High Perfonnance Computing Applications, Feb. 2005, pp. 49 - 66.
10 Poletti Francesco, Poggiali Antonio, and Paul Marchal, "Flexible hardware/software support for message passing on a distributed shared memory architecture," Design, Automation and Test in Europe 2005, March 2005, Vol. 2, pp. 736-741.
11 정하영,정원영,이용석, "MPSoC를 위한 저비용 하드웨어 MPI 유닛 설계," 한국통신학회지, 제36권, 제1호, pp. 86-92, 2011.