Browse > Article
http://dx.doi.org/10.7840/kics.2012.37B.9.795

A Design of Pipeline Chain Algorithm Based on Circuit Switching for MPI Broadcast Communication System  

Yun, Heejun (연세대학교 전기전자공학과 프로세서 연구실)
Chung, Wonyoung (연세대학교 전기전자공학과 프로세서 연구실)
Lee, Yong-Surk (연세대학교 전기전자공학과 프로세서 연구실)
Abstract
This paper proposes an algorithm and a hardware architecture for a broadcast communication which has the worst bottleneck among multiprocessor using distributed memory architectures. In conventional system, The pipelined broadcast algorithm is an algorithm which takes advantage of maximum bandwidth of communication bus. But unnecessary synchronization process are repeated, because the pipelined broadcast sends the data divided into many parts. In this paper, the MPI unit for pipeline chain algorithm based on circuit switching removing the redundancy of synchronization process was designed, the proposed architecture was evaluated by modeling it with systemC. Consequently, the performance of the proposed architecture was highly improved for broadcast communication up to 3.3 times that of systems using conventional pipelined broadcast algorithm, it can almost take advantage of the maximum bandwidth of transmission bus. Then, it was implemented with VerilogHDL, synthesized with TSMC 0.18um library and implemented into a chip. The area of synthesis results occupied 4,700 gates(2 input NAND gate) and utilization of total area is 2.4%. The proposed architecture achieves improvement in total performance of MPSoC occupying relatively small area.
Keywords
MPSoC; Message Passing; Distributed memory; Broadcast Collective operation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Manuel Saldana and Paul Chow, "TMD-MPI IMPLEMETATION FOR MULTIPLE PROCESSORS ACROSS MULTIPLE FPGAS," FPL'06, Aug. 2006, pp. 1-6.
2 W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks. San Francisco, CA: Morgan Kaufmann, 2004
3 Thakur, Rajeev, et al., "Optimization of collective communication operations in mpich," International Journal of High Performance Computing Applications, Feb. 2005, pp. 49-66.
4 George Almasi, Charles J.Archer, C. Chris Erway, Philip Heidelberger, Xavier Martorell, Jose E. Moreira, B.Steinmacher-Burow, and YiliZheng, "Optimization of MPI Collective Communication on BlueGene/L Systems," ICS'05, Prec. of the 19th annual international conf. on Supercomputing, June. 2005, pp. 253-262.
5 Poletti Francesco, Poggiali Antonio, and Paul Marchal, "Flexible hardware/software support for message passing on a distributed shared memory architecture," Design, Automation and Test in Europe 2005, Mar., Vol. 2, 2005, pp. 736-741.
6 J. Watts and R. van de Geijn, "A Pipelined Broadcast for Multidimensional Meshes," Parallel Proc. Letter, 1995
7 MPI-forum, "Message passing interface forum," Jan. 2009, uRL: http://www.mpi-forum.org.
8 R. Rabenseifner, "Automatic MPI counter profiling of all users: First results on a CRAY T3E 900-512," Proceedings of the Message Passing Interface Developer's and User's Conference 1999(MPIDC99), 1999, pp 77-85.
9 S. S. Vadhiyar, G. E. Fagg, and J. Dongarra. "Automatically Tuned Collective Communications," In Proc. of SC'00: High Performance Networking and Computing, 2000.
10 Mike Barnett, Satya Gupta, David G. Payne, Lance Shuler, and Robert van de Geijn, "Building a High-Performance Collective Communication Library," Supercomputing'94, Nov. 1994, pp 107-116.
11 M. Barnett, D. Payne, R. van de Geijn and J. Watts, "Broadcasting on meshes with worm-hole routing," Technical Report , Department of Computer Sciences, the University of Texas at Austin, Nov. 1994.
12 Daniel L. Ly, Manuel Saldana, Paul Chow, "The Challenges of Using An Embedded MPI for Hardware-based Processing Nodes," Field-Programmable Technology(FPT) 2009, Sydney, NSW, Dec. 2009, pp. 120-127.
13 T. P. McMahon and A. Skjellum, "eMPI/eMPICH: Embedding MPI," MPI Developers Conf., 1996, pp. 180-184.
14 M. Saldana, A. Patel, C. Madill, N. D., A. Wang, A. Putnam, R. Wittig, and P. Chow, "MPI as an abstraction for software-hardware interaction for HPRCs," in International Workshop on High-Performance Reconfigurable Computing Technology and Applications, Nov. 2008, pp. 1-10.
15 M. Tomasevic and V.M. Milutinovic, "Hardware Approaches to Cache Coherence in Shared-Memory Multiprocessors," IEEE Micro, vol. 14, nos. 5-6, Oct./Dec. 1994, pp. 52-59.   DOI   ScienceOn
16 C. Pedraza, E. Castillo, J. Castillo, C. Camarero, J. Bosque, J. Martinez, and R. Menendez, "Cluster architecture based on low cost reconfigurable hardware," in International Conference on Field Programmable Logic and Applications, Sept. 2008, pp. 595-598.
17 A. C. Klaiber, H. M. Levy, "A comparison of message passing and shared memory architectures for data parallel programs," Proceedings of the 21st annual international symposium on Computer architecture, Vol 22, Apr. 1994, pp. 94-105
18 P. Stenstrom, "A Survey of Cache Coherence Schemes for Multiprocessors," Computer, Vol. 23, Jun. 1990, pp. 12-24.
19 L. Benini and G.de Micheli, " Networks On Chip: A New SoC Paradigm," IEEE Computer, Vol 35, No. 1, Jan. 2002, pp. 70-78.
20 L. Gwennap, "Apple A5 Adds New Features," Microprocessor report, May 2011