Search | Korea Science

The Design of Hardware MPI Units for MPSoC (MPSoC를 위한 저비용 하드웨어 MPI 유닛 설계)

Jeong, Ha-Young;Chung, Won-Young;Lee, Yong-Surk
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.36 no.1B
- /
- pp.86-92
- /
- 2011
In this paper, we propose a novel hardware MPI(Message Passing Interface) unit which supports message passing in multiprocessor system which use distributed memory architecture. MPI Hardware unit processes data synchronization, transmission and completion, and it supports processor non-blocking operation so it reduces overhead according to synchronization. Additionally, MPI hardware unit combines ready entry, request entry, reserve entry which save and manage the synchronized messages and performs the multiple outstanding issue and out of order completion. According to BFM(Bus Functional Model) simulation result, the performance is increased by 25% on many to many communication. After we designed MPI unit using HDL, with synopsys design compiler we synthesized, and for synthesis library we used MagnaChip $0.18{\mu}m$. And then we making prototype chip. The proposed message transmission interface hardware shows high performance for its increase in size. Thus, as we consider low-cost design and scalability, MPI hardware unit is useful in increasing overall performance of embedded MPSoC(Multi-Processor System-on-Chip).
https://doi.org/10.7840/KICS.2011.36B.1.86 인용 PDF KSCI

Dynamic On-Chip Network based on Clustering for MPSoC (동적 라우팅을 사용하는 클러스터 기반 MPSoC 구조)

Kim, Jang-Eok;Kim, Jae-Hwan;Ahn, Byung-Gyu;Sin, Bong-Sik;Chong, Jong-Wha
- Proceedings of the IEEK Conference
- /
- 2006.06a
- /
- pp.991-992
- /
- 2006
Multiprocessor system is efficient and high performance architecture to overcome a limitation of single core SoC. In this paper, we propose a multiprocessor SoC (MPSoC) architecture which provides the low complexity and the high performance. The dynamic routing scheme has a serious problem in which the complexity of routing increases exponentially. We solve this problem by making a cluster with several PEs (Processing Element). In inter-cluster network, we use deterministic routing scheme and in intra-cluster network, we use dynamic routing scheme. In order to control the hierarchical network, we propose efficient router architecture by using smart crossbar switch. We modeled 2-D mesh topology and used simulator based on C/C++. The results of this routing scheme show that our approach has less complexity and improved throughput as compared with the pure deterministic routing architecture and the pure dynamic routing architecture.
PDF

A System Level Network-on-chip Model with MLDesigner

Agarwal, Ankur;Shankar, Rabi;Pandya, A.S.;Lho, Young-Uhg
- Journal of information and communication convergence engineering
- /
- v.6 no.2
- /
- pp.122-128
- /
- 2008
Multiprocessor architectures and platforms, such as, a multiprocessor system on chip (MPSoC) recently introduced to extend the applicability of the Moore's law, depend upon concurrency and synchronization in both software and hardware to enhance design productivity and system performance. With the rapidly approaching billion transistors era, some of the main problem in deep sub-micron technologies characterized by gate lengths in the range of 60-90 nm will arise from non scalable wire delays, errors in signal integrity and non-synchronized communication. These problems may be addressed by the use of Network on Chip (NOC) architecture for future System-on-Chip (SoC). We have modeled a concurrent architecture for a customizable and scalable NOC in a system level modeling environment using MLDesigner (from MLD Inc.). Varying network loads under various traffic scenarios were applied to obtain realistic performance metrics. We provide the simulation results for latency as a function of the buffer size. We have abstracted the area results for NOC components from its FPGA implementation. Modeled NOC architecture supports three different levels of quality-of-service (QoS).
PDF KSCI

A Deadlock Free Router Design for Network-on-Chip Architecture (NOC 구조용 교착상태 없는 라우터 설계)

Agarwal, Ankur;Mustafa, Mehmet;Shiuku, Ravi;Pandya, A.S.;Lho, Young-Ugh
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.11 no.4
- /
- pp.696-706
- /
- 2007
Multiprocessor system on chip (MPSoC) platform has set a new innovative trend for the System on Chip (SoC) design. With the rapidly approaching billion transistors era, some of the main problem in deep sub-micron technologies characterized by gate lengths in the range of 60-90 nm will arise from non scalable wire delays, errors in signal integrity and un-synchronized communication. These problems may be addressed by the use of Network on Chip (NOC) architecture for future SoC. Most future SoCs will use network architecture and a packet based communication protocol for on chip communication. This paper presents an adaptive wormhole routing with proactive turn prohibition to guarantee deadlock free on chip communication for NOC architecture. It shows a simple muting architecture with five full-duplex, flit-wide communication channels. We provide simulation results for message latency and compare results with those of dimension ordered techniques operating at the same link rates.
https://doi.org/10.6109/jkiice.2007.11.4.696 인용 PDF KSCI

Voltage Selection Methodology for DVFS Overhead Minimization (동적 전압 주파수 스케일링 오버헤드 최소화를 위한 전압 선택 방법론)

Chang, Jin Kyu;Han, Tae Hee
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2015.10a
- /
- pp.854-857
- /
- 2015
As the number of devices integrated on system-on-chip(SoC) increases exponentially, energy reduction technology is essential. Dynamic Voltage and Frequency Scaling (DVFS) is a very effective technique for reducing power consumption. Since it requires complex voltage regulators and PLL circuits, DVFS tends to have significant overheads. In this paper, we propose a new voltage selection algorithm to minimize transition overhead for multiprocessor SoC (MPSoC). Simulation results show that proposed algorithm appears less energy consumption with transition overhead even though maintains performance.
PDF

Parallel SystemC Cosimulation using Virtual Synchronization (가상 동기화 기법을 이용한 SystemC 통합시뮬레이션의 병렬 수행)

Yi, Young-Min;Kwon, Seong-Nam;Ha, Soon-Hoi
- Journal of KIISE:Computer Systems and Theory
- /
- v.33 no.12
- /
- pp.867-879
- /
- 2006
This paper concerns fast and time accurate HW/SW cosimulation for MPSoC(Multi-Processor System-on-chip) architecture where multiple software and/or hardware components exist. It is becoming more and more common to use MPSoC architecture to design complex embedded systems. In cosimulation of such architecture, as the number of the component simulators participating in the cosimulation increases, the time synchronization overhead among simulators increases, thereby resulting in low overall cosimulation performance. Although SystemC cosimulation frameworks show high cosimulation performance, it is in inverse proportion to the number of simulators. In this paper, we extend the novel technique, called virtual synchronization, which boosts cosimulation speed by reducing time synchronization overhead: (1) SystemC simulation is supported seamlessly in the virtual synchronization framework without requiring the modification on SystemC kernel (2) Parallel execution of component simulators with virtual synchronization is supported. We compared the performance and accuracy of the proposed parallel SystemC cosimulation framework with MaxSim, a well-known commercial SystemC cosimulation framework, and the proposed one showed 11 times faster performance for H.263 decoder example, while the accuracy was maintained below 5%.
PDF KSCI

Bus Splitting Techniques for MPSoC to Reduce Bus Energy (MPSoC 플랫폼의 버스 에너지 절감을 위한 버스 분할 기법)

Chung Chun-Mok;Kim Jin-Hyo;Kim Ji-Hong
- Journal of KIISE:Computer Systems and Theory
- /
- v.33 no.9
- /
- pp.699-708
- /
- 2006
Bus splitting technique reduces bus energy by placing modules with frequent communications closely and using necessary bus segments in communications. But, previous bus splitting techniques can not be used in MPSoC platform, because it uses cache coherency protocol and all processors should be able to see the bus transactions. In this paper, we propose a bus splitting technique for MPSoC platform to reduce bus energy. The proposed technique divides a bus into several bus segments, some for private memory and others for shared memory. So, it minimizes the bus energy consumed in private memory accesses without producing cache coherency problem. We also propose a task allocation technique considering cache coherency protocol. It allocates tasks into processors according to the numbers of bus transactions and cache coherence protocol, and reduces the bus energy consumption during shared memory references. The experimental results from simulations say the bus splitting technique reduces maximal 83% of the bus energy consumption by private memory accesses. Also they show the task allocation technique reduces maximal 30% of bus energy consumed in shared memory references. We can expect the bus splitting technique and the task allocation technique can be used in multiprocessor platforms to reduce bus energy without interference with cache coherency protocol.
PDF KSCI

Separated Address/Data Network Design for Bus Protocol compatible Network-on-Chip (버스 프로토콜 호환 가능한 네트워크-온-칩에서의 분리된 주소/데이터 네트워크 설계)

Chung, Seungh Ah;Lee, Jae Hoon;Kim, Sang Heon;Lee, Jae Sung;Han, Tae Hee
- Journal of the Institute of Electronics and Information Engineers
- /
- v.53 no.4
- /
- pp.68-75
- /
- 2016
As the number of cores and IPs increase in multiprocessor system-on-chip (MPSoC), network-on-chip (NoC) has emerged as a promising novel interconnection architecture for its parallelism and scalability. However, minimization of the latency in NoC with legacy bus IPs must be addressed. In this paper, we focus on the latency minimization problem in NoC which accommodates legacy bus protocol based IPs considering the trade-offs between hop counts and path collisions. To resolve this problem, we propose separated address/data network for independent address and data phases of bus protocol. Compared to Mesh and irregular topologies generated by TopGen, experimental results show that average latency and execution time are reduced by 19.46% and 10.55%, respectively.
https://doi.org/10.5573/ieie.2016.53.4.068 인용 PDF KSCI

A Design of Pipeline Chain Algorithm Based on Circuit Switching for MPI Broadcast Communication System (MPI 브로드캐스트 통신을 위한 서킷 스위칭 기반의 파이프라인 체인 알고리즘 설계)

Yun, Heejun;Chung, Wonyoung;Lee, Yong-Surk
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.37B no.9
- /
- pp.795-805
- /
- 2012
This paper proposes an algorithm and a hardware architecture for a broadcast communication which has the worst bottleneck among multiprocessor using distributed memory architectures. In conventional system, The pipelined broadcast algorithm is an algorithm which takes advantage of maximum bandwidth of communication bus. But unnecessary synchronization process are repeated, because the pipelined broadcast sends the data divided into many parts. In this paper, the MPI unit for pipeline chain algorithm based on circuit switching removing the redundancy of synchronization process was designed, the proposed architecture was evaluated by modeling it with systemC. Consequently, the performance of the proposed architecture was highly improved for broadcast communication up to 3.3 times that of systems using conventional pipelined broadcast algorithm, it can almost take advantage of the maximum bandwidth of transmission bus. Then, it was implemented with VerilogHDL, synthesized with TSMC 0.18um library and implemented into a chip. The area of synthesis results occupied 4,700 gates(2 input NAND gate) and utilization of total area is 2.4%. The proposed architecture achieves improvement in total performance of MPSoC occupying relatively small area.
https://doi.org/10.7840/kics.2012.37B.9.795 인용 PDF KSCI

Software Pipeline-Based Partitioning Method with Trade-Off between Workload Balance and Communication Optimization

Huang, Kai;Xiu, Siwen;Yu, Min;Zhang, Xiaomeng;Yan, Rongjie;Yan, Xiaolang;Liu, Zhili
- ETRI Journal
- /
- v.37 no.3
- /
- pp.562-572
- /
- 2015
For a multiprocessor System-on-Chip (MPSoC) to achieve high performance via parallelism, we must consider how to partition a given application into different components and map the components onto multiple processors. In this paper, we propose a software pipeline-based partitioning method with cyclic dependent task management and communication optimization. During task partitioning, simultaneously considering computation load balance and communication optimization can cause interference, which leads to performance loss. To address this issue, we formulate their constraints and apply an integer linear programming approach to find an optimal partitioning result - one that requires a trade-off between these two factors. Experimental results on a reconfigurable MPSoC platform demonstrate the effectiveness of the proposed method, with 20% to 40% performance improvements compared to a traditional software pipeline-based partitioning method.
https://doi.org/10.4218/etrij.15.0114.0502 인용 PDF KSCI

Search Result 14, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)