• Title/Summary/Keyword: Chip multiprocessor

Search Result 42, Processing Time 0.022 seconds

Design and Implementation of an InfiniBand System Interconnect for High-Performance Cluster Systems (고성능 클러스터 시스템을 위한 인피니밴드 시스템 연결망의 설계 및 구현)

  • Mo, Sang-Man;Park, Kyung;Kim, Sung-Nam;Kim, Myung-Jun;Im, Ki-Wook
    • The KIPS Transactions:PartA
    • /
    • v.10A no.4
    • /
    • pp.389-396
    • /
    • 2003
  • InfiniBand technology is being accepted as the future system interconnect to serve as the high-end enterprise fabric for cluster computing. This paper presents the design and implementation of the InfiniBand system interconnect, focusing on an InfiniBand host channel adapter (HCA) based on dual ARM9 processor cores The HCA is an SoC tailed KinCA which connects a host node onto the InfiniBand network both in hardware and in software. Since the ARM9 processor core does not provide necessary features for multiprocessor configuration, novel inter-processor communication and interrupt mechanisms between the two processors were designed and embedded within the KinCA chip. Kinch was fabricated as a 564-pin enhanced BGA (Bail Grid Array) device using 0.18${\mu}{\textrm}{m}$ CMOS technology Mounted on host nodes, it provides 10 Gbps outbound and inbound channels for transmit and receive, respectively, resulting in a high-performance cluster system.

A Study on the Parallel Routing in Hybrid Optical Networks-on-Chip (하이브리드 광학 네트워크-온-칩에서 병렬 라우팅에 관한 연구)

  • Seo, Jung-Tack;Hwang, Yong-Joong;Han, Tae-Hee
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.8
    • /
    • pp.25-32
    • /
    • 2011
  • Networks-on-chip (NoC) is emerging as a key technology to overcome severe bus traffics in ever-increasing complexity of the Multiprocessor systems-on-chip (MPSoC); however traditional electrical interconnection based NoC architecture would be faced with technical limits of bandwidth and power consumptions in the near future. In order to cope with these problems, a hybrid optical NoC architecture which use both electrical interconnects and optical interconnects together, has been widely investigated. In the hybrid optical NoCs, wormhole switching and simple deterministic X-Y routing are used for the electrical interconnections which is responsible for the setup of routing path and optical router to transmit optical data through optical interconnects. Optical NoC uses circuit switching method to send payload data by preset paths and routers. However, conventional hybrid optical NoC has a drawback that concurrent transmissions are not allowed. Therefore, performance improvement is limited. In this paper, we propose a new routing algorithm that uses circuit switching and adaptive algorithm for the electrical interconnections to transmit data using multiple paths simultaneously. We also propose an efficient method to prevent livelock problems. Experimental results show up to 60% throughput improvement compared to a hybrid optical NoC and 65% power reduction compared to an electrical NoC.

MPSoC Design Space Exploration Based on Static Analysis of Process Network Model (프로세스 네트워크 모델의 정적 분석에 기반을 둔 다중 프로세서 시스템 온 칩 설계 공간 탐색)

  • Ahn, Yong-Jin;Choi, Ki-Young
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.10
    • /
    • pp.7-16
    • /
    • 2007
  • In this paper, we introduce a new design environment for efficient multiprocessor system-on-chip design space exploration. The design environment takes a process network model as input system specification. The process network model has been widely used for modeling signal processing applications because of its excellent modeling power. However, it has limitation in predictability, which could cause severe problem for real time systems. This paper proposes a new approach that enables static analysis of a process network model by converting it to a hierarchical synchronous dataflow model. For efficient design space exploration in the early design step, mapping application to target architectures has been a crucial part for finding better solution. In this paper, we propose an efficient mapping algorithm. Our mapping algorithm supports both single bus architecture and multiple bus architecture. In the experiments, we show that the automatic conversion approach of the process network model for static analysis is performed successfully for several signal processing applications, and show the effectiveness of our mapping algorithm by comparing it with previous approaches.

Energy-aware EDZL Real-Time Scheduling on Multicore Platforms (멀티코어 플랫폼에서 에너지 효율적 EDZL 실시간 스케줄링)

  • Han, Sangchul
    • Journal of KIISE
    • /
    • v.43 no.3
    • /
    • pp.296-303
    • /
    • 2016
  • Mobile real-time systems with limited system resources and a limited power source need to fully utilize the system resources when the workload is heavy and reduce energy consumption when the workload is light. EDZL (Earliest Deadline until Zero Laxity), a multiprocessor real-time scheduling algorithm, can provide high system utilization, but little work has been done aimed at reducing its energy consumption. This paper tackles the problem of DVFS (Dynamic Voltage/Frequency Scaling) in EDZL scheduling. It proposes a technique to compute a uniform speed on full-chip DVFS platforms and individual speeds of tasks on per-core DVFS platforms. This technique, which is based on the EDZL schedulability test, is a simple but effective one for determining the speeds of tasks offline. We also show through simulation that the proposed technique is useful in reducing energy consumption.

Software Pipeline-Based Partitioning Method with Trade-Off between Workload Balance and Communication Optimization

  • Huang, Kai;Xiu, Siwen;Yu, Min;Zhang, Xiaomeng;Yan, Rongjie;Yan, Xiaolang;Liu, Zhili
    • ETRI Journal
    • /
    • v.37 no.3
    • /
    • pp.562-572
    • /
    • 2015
  • For a multiprocessor System-on-Chip (MPSoC) to achieve high performance via parallelism, we must consider how to partition a given application into different components and map the components onto multiple processors. In this paper, we propose a software pipeline-based partitioning method with cyclic dependent task management and communication optimization. During task partitioning, simultaneously considering computation load balance and communication optimization can cause interference, which leads to performance loss. To address this issue, we formulate their constraints and apply an integer linear programming approach to find an optimal partitioning result - one that requires a trade-off between these two factors. Experimental results on a reconfigurable MPSoC platform demonstrate the effectiveness of the proposed method, with 20% to 40% performance improvements compared to a traditional software pipeline-based partitioning method.

Bus Splitting Techniques for MPSoC to Reduce Bus Energy (MPSoC 플랫폼의 버스 에너지 절감을 위한 버스 분할 기법)

  • Chung Chun-Mok;Kim Jin-Hyo;Kim Ji-Hong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.699-708
    • /
    • 2006
  • Bus splitting technique reduces bus energy by placing modules with frequent communications closely and using necessary bus segments in communications. But, previous bus splitting techniques can not be used in MPSoC platform, because it uses cache coherency protocol and all processors should be able to see the bus transactions. In this paper, we propose a bus splitting technique for MPSoC platform to reduce bus energy. The proposed technique divides a bus into several bus segments, some for private memory and others for shared memory. So, it minimizes the bus energy consumed in private memory accesses without producing cache coherency problem. We also propose a task allocation technique considering cache coherency protocol. It allocates tasks into processors according to the numbers of bus transactions and cache coherence protocol, and reduces the bus energy consumption during shared memory references. The experimental results from simulations say the bus splitting technique reduces maximal 83% of the bus energy consumption by private memory accesses. Also they show the task allocation technique reduces maximal 30% of bus energy consumed in shared memory references. We can expect the bus splitting technique and the task allocation technique can be used in multiprocessor platforms to reduce bus energy without interference with cache coherency protocol.

Diffusion of software innovation: a Petri Net theory perspective (Petri Net 이론 관점에서 본 소프트웨어 혁신의 확산)

  • Han, Jiyeon;Ahn, Jongchang;Lee, Ook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.2
    • /
    • pp.858-867
    • /
    • 2013
  • Hardware and software field are developed by environment of MPSOC. Also it is still working with economic world and academic world. This study focus on software side and try to classify from parallel programming design world. It can be divided by three; Data, Tasks, and Data flow model. Then we used Petri Net to CUDA and HOPES programmer and found how much they understand parallel programming for each side. We focus on two sides and what is different between their experience. Petri Net is easy to descript parallel program or parallel design pattern for Task, Data, and Hybird. This research can explain how they know and how much they know about parallel programming.

The Design of MPI Hardware Unit for Enhanced Broadcast Communication (효율적인 브로드캐스트 통신을 지원하는 MPI 하드웨어 유닛 설계)

  • Yun, Hee-Jun;Chung, Won-Young;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.11B
    • /
    • pp.1329-1338
    • /
    • 2011
  • This paper proposes an algorithm and hardware architecture for a broadcast communication which has the worst bottleneck among multiprocessor using distributed memory architectures. In conventional systems, collective communication is converted into point-to-point communications by MPI library cell without considering the state of communication port of each processing node which represents the processing node is in busy state or free state. If conflicting point-to-point communication occurs during broadcast communication, the transmitting speed for broadcast communication is decreased. Thus, this paper proposed an algorithm which determines the order of point-to-point communications for broadcast communication according to the state of each processing node. According to the state of each processing node, the proposed algorithm decreases total broadcast communication time by transmitting message preferentially to the processing node with communication port in free state. The proposed MPI unit for broadcast communication is evaluated by modeling it with systemC. In addition, it achieved a highly improved performance for broadcast communication up to 78% with 16 nodes. This result shows the proposed algorithm is useful to improving total performance of MPSoC.

A Design of Pipeline Chain Algorithm Based on Circuit Switching for MPI Broadcast Communication System (MPI 브로드캐스트 통신을 위한 서킷 스위칭 기반의 파이프라인 체인 알고리즘 설계)

  • Yun, Heejun;Chung, Wonyoung;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37B no.9
    • /
    • pp.795-805
    • /
    • 2012
  • This paper proposes an algorithm and a hardware architecture for a broadcast communication which has the worst bottleneck among multiprocessor using distributed memory architectures. In conventional system, The pipelined broadcast algorithm is an algorithm which takes advantage of maximum bandwidth of communication bus. But unnecessary synchronization process are repeated, because the pipelined broadcast sends the data divided into many parts. In this paper, the MPI unit for pipeline chain algorithm based on circuit switching removing the redundancy of synchronization process was designed, the proposed architecture was evaluated by modeling it with systemC. Consequently, the performance of the proposed architecture was highly improved for broadcast communication up to 3.3 times that of systems using conventional pipelined broadcast algorithm, it can almost take advantage of the maximum bandwidth of transmission bus. Then, it was implemented with VerilogHDL, synthesized with TSMC 0.18um library and implemented into a chip. The area of synthesis results occupied 4,700 gates(2 input NAND gate) and utilization of total area is 2.4%. The proposed architecture achieves improvement in total performance of MPSoC occupying relatively small area.

Programming Model for SODA-II: a Baseband Processor for Software Defined Radio Systems (SDR용 기저대역 프로세서를 위한 프로그래밍 모델)

  • Lee, Hyun-Seok;Yi, Joon-Hwan;Oh, Hyuk-Jun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.7
    • /
    • pp.78-86
    • /
    • 2010
  • This paper discusses the programming model of SODA-II that is a baseband processor for software defined radio (SDR) systems. Signal processing On-Demand Architecture Ⅱ (SODA-II) is an on-chip multiprocessor architecture consisting of four processor cores and each core has both an wide SIMD datapath and a scalar datapath. This architecture is appropriate for baseband processing that is a mixture of vector computations and scalar computations. The programming model of the SODA-II is based on C library routines. Because the library routines hide the details of complex SIMD datapath control procedures, end users can easily program the SODA-II without deep understanding on its architecture. In this paper, we discuss the details of library routines and how these routines are exploited in the implementation of baseband signal processing algorithms. As application examples, we show the implementation result of W-CDMA multipath searcher and OFDM demodulator on the SODA-II.