• 제목/요약/키워드: On-Chip Multiprocessor

검색결과 40건 처리시간 0.03초

A System Level Network-on-chip Model with MLDesigner

  • Agarwal, Ankur;Shankar, Rabi;Pandya, A.S.;Lho, Young-Uhg
    • Journal of information and communication convergence engineering
    • /
    • 제6권2호
    • /
    • pp.122-128
    • /
    • 2008
  • Multiprocessor architectures and platforms, such as, a multiprocessor system on chip (MPSoC) recently introduced to extend the applicability of the Moore's law, depend upon concurrency and synchronization in both software and hardware to enhance design productivity and system performance. With the rapidly approaching billion transistors era, some of the main problem in deep sub-micron technologies characterized by gate lengths in the range of 60-90 nm will arise from non scalable wire delays, errors in signal integrity and non-synchronized communication. These problems may be addressed by the use of Network on Chip (NOC) architecture for future System-on-Chip (SoC). We have modeled a concurrent architecture for a customizable and scalable NOC in a system level modeling environment using MLDesigner (from MLD Inc.). Varying network loads under various traffic scenarios were applied to obtain realistic performance metrics. We provide the simulation results for latency as a function of the buffer size. We have abstracted the area results for NOC components from its FPGA implementation. Modeled NOC architecture supports three different levels of quality-of-service (QoS).

대규모 신경망 시뮬레이션을 위한 칩상 학습가능한 단일칩 다중 프로세서의 구현 (Design of a Dingle-chip Multiprocessor with On-chip Learning for Large Scale Neural Network Simulation)

  • 김종문;송윤선;김명원
    • 전자공학회논문지B
    • /
    • 제33B권2호
    • /
    • pp.149-158
    • /
    • 1996
  • In this paper we describe designing and implementing a digital neural chip and a parallel neural machine for simulating large scale neural netsorks. The chip is a single-chip multiprocessor which has four digiral neural processors (DNP-II) of the same architecture. Each DNP-II has program memory and data memory, and the chip operates in MIMD (multi-instruction, multi-data) parallel processor. The DNP-II has the instruction set tailored to neural computation. Which can be sed to effectively simulate various neural network models including on-chip learning. The DNP-II facilitates four-way data-driven communication supporting the extensibility of parallel systems. The parallel neural machine consists of a host computer, processor boards, a buffer board and an interface board. Each processor board consists of 8*8 array of DNP-II(equivalently 2*2 neural chips). Each processor board acn be built including linear array, 2-D mesh and 2-D torus. This flexibility supports efficiency of mapping from neural network models into parallel strucgure. The neural system accomplishes the performance of maximum 40 GCPS(giga connection per second) with 16 processor boards.

  • PDF

Raptor의 정수처리기 설계 (Design of the Integer Processor Unit for RAPTOR)

  • 송윤섭;김도형
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 1998년도 추계종합학술대회 논문집
    • /
    • pp.763-766
    • /
    • 1998
  • This paper describes the microarchitecture of the integer processor unit of RAPTOR which is an on-chip multiprocessor. The integer processor unit implements the 64-bit SPARC-V9 architecture and supports by hardware out-of-order instruction execution. The unit is designed to be handy so that multiple copies of the unit cn be integrated with cache memories into a single chip. The design was proceeded in a top-down manner. The hardware description and its verfication were performed using Verilog-HDL.

  • PDF

칩 멀티쓰레딩 서버에서 OpenMP 프로그램의 성능과 확장성 (Performance and Scalability of OpenMP Programs on Chip-MultiThreading Server)

  • 이명호;김용규
    • 정보처리학회논문지A
    • /
    • 제13A권2호
    • /
    • pp.137-146
    • /
    • 2006
  • 최근 Chip-level MuitiThreading(CMT) 기술을 내장한 프로세서 들이 출시되면서 그들을 기반으로 하는 공유 메모리 다중 프로세서(SMP: Shared Memory Multiprocessor) 서버 또한 그 사용이 점점 더 보편화 되고있다. OpenMP는 그 사용의 효율성으로 인하여 SMP 시스템을 위한 응용 프로그램의 병렬화를 위한 표준이 되었다. 고성능 컴퓨팅(HPC: High Performance Computing) 응용프로그램 분야에서 더욱 더 빠른 컴퓨터의 처리 능력에 대한 요구가 증가함에 따라, OpenMP 지시어를 사용하여 병렬화된 HPC 응용 프로그램 들의 성능과 확장성을 높이는 일은 그 중요성이 점차 증대되고 있다. 본 논문에서는 CMT 기술을 내장한 대용량 SMP서버인 Sun Fire E25K에서 OpenMP 지시어를 사용하여 병렬화된 HPC 응용 프로그램 들의 suite인 SPEC OMPL(OpenMP를 위한 표준 벤치마크 suite)의 성능과 확장성에 관해 연구했다. 본 논문에서는 또한 SPEC OMPL에 대한 CMT 기술의 효능을 평가하였다.

AMBA AHB와 AXI간 연동을 위한 Switch Wrapper의 설계 (A Switch Wrapper Design for an AMBA AXI On-Chip-Network)

  • 이정수;장지호;이호영;김준성
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2005년도 추계종합학술대회
    • /
    • pp.869-872
    • /
    • 2005
  • In this paper we present a switch wrapper for an AMBA AXI, which is an efficient on-chip-network interface compared to bus-based interfaces in a multiprocessor SoC. The AXI uses an idea of NoC to provide the increasing demands on communication bandwidth within a single chip. A switch wrapper for AXI is located between a interconnection network and two IPs connecting them together. It carries out a mode of routing to interconnection network and executes protocol conversions to provide compatibility in IP reuse. A switch wrapper consists of a direct router, AHB-AXI converters, interface modules and a controller modules. We propose the design of a all-in-one type switch wrapper.

  • PDF

Low-power heterogeneous uncore architecture for future 3D chip-multiprocessors

  • Dorostkar, Aniseh;Asad, Arghavan;Fathy, Mahmood;Jahed-Motlagh, Mohammad Reza;Mohammadi, Farah
    • ETRI Journal
    • /
    • 제40권6호
    • /
    • pp.759-773
    • /
    • 2018
  • Uncore components such as on-chip memory systems and on-chip interconnects consume a large amount of energy in emerging embedded applications. Few studies have focused on next-generation analytical models for future chip-multiprocessors (CMPs) that simultaneously consider the impacts of the power consumption of core and uncore components. In this paper, we propose a convex-optimization approach to design heterogeneous uncore architectures for embedded CMPs. Our convex approach optimizes the number and placement of memory banks with different technologies on the memory layer. In parallel with hybrid memory architecting, optimizing the number and placement of through silicon vias as a viable solution in building three-dimensional (3D) CMPs is another important target of the proposed approach. Experimental results show that the proposed method outperforms 3D CMP designs with hybrid and traditional memory architectures in terms of both energy delay products (EDPs) and performance parameters. The proposed method improves the EDPs by an average of about 43% compared with SRAM design. In addition, it improves the throughput by about 7% compared with dynamic RAM (DRAM) design.

NOC 구조용 교착상태 없는 라우터 설계 (A Deadlock Free Router Design for Network-on-Chip Architecture)

  • ;;;;노영욱
    • 한국정보통신학회논문지
    • /
    • 제11권4호
    • /
    • pp.696-706
    • /
    • 2007
  • 다중처리기 SoC(MPSoC) 플랫폼은 SoC 설계 분야에 새로운 여러가지 혁신적인 트랜드를 가지고 있다. 급격히 십억 단위의 트랜지스터 집적이 가능한 시대에 게이트 길이가 $60{\sim}90nm$ 범위를 갖는 서브 마스크로 기술에서 주요문제점들은 확장되지 않는 선 지연, 신호 무결성과 비동기화 통신에서의 오류로 인해 발생한다. 이러한 문제점들은 미래의 SoC을 위한 NOC 구조의 사용에 의해 해결될 수 있다. 대부분의 미래 SoC들은 칩 상에서 통신을 위해 네트워크 구조와 패킷 기반 통신 프로토콜을 사용할 것이다. 이 논문은 NOC 구조를 위한 칩 통신에서 교착상태가 발생되지 않는 것을 보장하기 위해 적극적 turn prohibition을 갖는 적응적 wormhole 라우팅에 대해 기술한다. 또한 5개의 전이중, flit-wide 통신 채널을 갖는 간단한 라우팅 구조를 제시한다. 메시지 지연에 대한 시뮬레이션 결과를 나타내고 같은 연결비율에서 운영되는 다른 기술들의 결과와 비교한다.

Distributed arbitration scheme for on-chip CDMA bus with dynamic codeword assignment

  • Nikolic, Tatjana R.;Nikolic, Goran S.;Djordjevic, Goran Lj.
    • ETRI Journal
    • /
    • 제43권3호
    • /
    • pp.471-482
    • /
    • 2021
  • Several code-division multiple access (CDMA)-based interconnect schemes have been recently proposed as alternatives to the conventional time-division multiplexing bus in multicore systems-on-chip. CDMA systems with a dynamic assignment of spreading codewords are particularly attractive because of their potential for higher bandwidth efficiency compared with the systems in which the codewords are statically assigned to processing elements. In this paper, we propose a novel distributed arbitration scheme for dynamic CDMA-bus-based systems, which solves the complexity and scalability issues associated with commonly used centralized arbitration schemes. The proposed arbitration unit is decomposed into multiple simple arbitration elements, which are connected in a ring. The arbitration ring implements a token-passing algorithm, which both resolves destination conflicts and assigns the codewords to processing elements. Simulation results show that the throughput reduction in an optimally configured dynamic CDMA bus due to arbitration-related overheads does not exceed 5%.

동적 라우팅을 사용하는 클러스터 기반 MPSoC 구조 (Dynamic On-Chip Network based on Clustering for MPSoC)

  • 김장억;김재환;안병규;신봉식;정정화
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2006년도 하계종합학술대회
    • /
    • pp.991-992
    • /
    • 2006
  • Multiprocessor system is efficient and high performance architecture to overcome a limitation of single core SoC. In this paper, we propose a multiprocessor SoC (MPSoC) architecture which provides the low complexity and the high performance. The dynamic routing scheme has a serious problem in which the complexity of routing increases exponentially. We solve this problem by making a cluster with several PEs (Processing Element). In inter-cluster network, we use deterministic routing scheme and in intra-cluster network, we use dynamic routing scheme. In order to control the hierarchical network, we propose efficient router architecture by using smart crossbar switch. We modeled 2-D mesh topology and used simulator based on C/C++. The results of this routing scheme show that our approach has less complexity and improved throughput as compared with the pure deterministic routing architecture and the pure dynamic routing architecture.

  • PDF

Performance Oriented Docket-NoC (Dt-NoC) Scheme for Fast Communication in NoC

  • Vijayaraj, M.;Balamurugan, K.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제16권3호
    • /
    • pp.359-366
    • /
    • 2016
  • Today's multi-core technology rapidly increases with more and more Intellectual Property cores on a single chip. Network-on-Chip (NoC) is an emerging communication network design for SoC. For efficient on-chip communication, routing algorithms plays an important role. This paper proposes a novel multicast routing technique entitled as Docket NoC (Dt-NoC), which eliminates the need of routing tables for faster communication. This technique reduces the latency and computing power of NoC. This work uses a CURVE restriction based algorithm to restrict few CURVES during the communication between source and destination and it prevents the network from deadlock and livelock. Performance evaluation is done by utilizing cycle accurate RTL simulator and by Cadence TSMC 18 nm technology. Experimental results show that the Dt-NoC architecture consumes power approximately 33.75% 27.65% and 24.85% less than Baseline XY, EnA, OEnA architectures respectively. Dt-NoC performs good as compared to other routing algorithms such as baseline XY, EnA, OEnA distributed architecture in terms of latency, power and throughput.