• Title/Summary/Keyword: Pipelined Processor

Search Result 75, Processing Time 0.024 seconds

VHDL Design for Out-of-Order Superscalar Processor of A Fully Pipelined Scheme (완전한 파이프라인 방식의 비순차실행 수퍼스칼라 프로세서의 VHDL 설계)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.1
    • /
    • pp.99-105
    • /
    • 2021
  • Today, a superscalar processor is the basic unit or an essential component of a multi-core processor, SoCs, and GPUs. Hence, a high-performance out-of-order superscalar processor must be adopted for these systems to maximize its performance. The superscalar processor fetches, issues, executes, and writes back multiple instructions per cycle by utilizing reorder buffers and reservation stations to dynamically schedule instructions in a pipelined scheme. In this paper, a fully pipelined out-of-order superscalar processor with speculative execution is designed with VHDL and verified with GHDL. As a result of the simulation, the program composed of ARM instructions is successfully performed.

A Design of 3D Graphics Lighting Processor for Mobile Applications (휴대 단말기용 3D Graphics Lighting Processor 설계)

  • Yang, Joon-Seok;Kim, Ki-Chul
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.837-840
    • /
    • 2005
  • This paper presents 3D graphics lighting processor based on vector processing using pipeline chaining. The lighting process of 3D graphics rendering contains many arithmetic operations and its complexity is very high. For high throughput, proposed processor uses pipelined functional units. To implement fully pipelined architecture, we have to use many functional units. Hence, the number of functional units is restricted. However, with the restricted number of pipelined functional units, the utilization of the units is reduced and a resource reservation problem is caused. To resolve these problems, the proposed architecture uses vector processing using pipeline chaining. Due to its pipeline chaining based architecture, it can perform 4.09M vertices per 1 second with 100MHz frequency. The proposed 3D graphics lighting processor is compatible with OpenGL ES API and the design is implemented and verified on FPGA.

  • PDF

A Design and Implementation of 32-bit RISC-V RV32IM Pipelined Processor in Embedded Systems (임베디드 환경에서의 32-bit RISC-V RV32IM 파이프라인 프로세서 설계 및 구현)

  • Subin Park;Yongwoo Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.4
    • /
    • pp.81-86
    • /
    • 2023
  • Recently, demand for embedded systems requiring low power and high specifications has been increasing, and RISC-V processors are being widely applied. RISC-V, a RISC-based open instruction set architecture (ISA), has been developed and researched by UC Berkeley and other researchers since 2010. RV32I ISA is sufficient to support integer operations such as addition and subtraction instructions, but M-extension should be defined for multiplication and division instructions. This paper proposes an RV32I, RV32IM processor, and indicates benchmark performance scores compared to an existing processor. Additionally, A non-stalling method was proposed to support a 2-stage pipelined DSP multiplier to the 5-stage pipelined RV32IM processor. Proposed RV32I and RV32IM processors satisfied a maximum operating frequency of 50 MHz on Artix-7 FPGA. The performance of the proposed processors was verified using benchmark programs from Dhrystone and Coremark. As a result, the Coremark benchmark results of the proposed processor showed that it outperformed the existing RV32IM processor by 23.91%.

  • PDF

Design and Simulation for Out-of-Order Execution Processor of a Fully Pipelined Scheme (완전한 파이프라인 방식의 비순차실행 프로세서의 설계 및 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.5
    • /
    • pp.143-149
    • /
    • 2020
  • Currently, a multi-core processor is mainly used as a central processing unit of a computer system, and a high-performance out-of-order processor is adopted as each core to maximize system performance. The early out-of-order execution processor with Tomasulo algorithm aimed at floating-point instructions, and it took several cycles to execute by the use of complex structures such as reorder buffer and reservation station. However, in order for the processor to properly utilize out-of-order execution and increase the throughput of instructions, it must operate in a fully pipelined manner. In this paper, a fully pipelined out-of-order processor with speculative execution is designed with VHDL and verified with GHDL. As a result of the simulation, a program composed of ARM instructions is successfully performed.

Design of 128 point pipelined FFT processor with 4-way structure (4-way 구조를 갖는 128 point 파이프라인 FFT 프로세서의 설계)

  • Lee, Sang-Min;Cho, Un-Sun;Lee, Seong-Joo;Kim, Jae-Seok
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.651-652
    • /
    • 2006
  • In this paper, 4-way data path 128 point pipelined FFT processor with 4-way structure is proposed. The proposed FFT processor has 4-way structure in order to meet data requirement of MB-OFDM system at 132MHz operating frequency. The FFT processor is based on R4MDC and extended to suit 4-way data path. The FFT processor is designed by Verilog HDL and the gate count is about 88k.

  • PDF

FPGA Design and Implementation of A Pipelined Out-of-Order Superscalar Processor (파이프라인식 비순차실행 수퍼스칼라 프로세서의 FPGA 설계 및 구현)

  • Jongbok Lee
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.153-158
    • /
    • 2023
  • Domestically, the importance of system semiconductor design is increasing, and the balanced development with the high-end memory semiconductors should be promoted. Using Xilinx Vivado as a development enivronment tool, it reduces time and cost dramatically in implementing the processor on FPGA. In this paper, the VHDL language which provides record data structure for an efficient digital system design is used for designing a pipelined out-of-order superscalar processor. It has been simulated extensively, synthesized and implemented on FPGA and verified by Integrated Logic Analyzer. As a result, the pipelined out-of-order superscalar processor could be executed successfully.

Construction of an Automatic Generation System of Embedded Processor Cores (임베디드 프로세서 코어 자동생성 시스템의 구축)

  • Cho Jae-Bum;You Yong-Ho;Hwang Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.6A
    • /
    • pp.526-534
    • /
    • 2005
  • This paper presents the structure and function of the system which automatically generates embedded processor cores using the SMDL. Accepting processor description in the SDML, the proposed system generates the processor core, consisting of the pipelined datapath and memory modules together with their control unit. The generated cores support muti-cycle instructions for proper handling of memory accesses, and resolve pipeline hazards encountered in the pipelined processors. Experimental results show the functional accuracy of the generated cores.

Desing of A RISC-Processor's Control Unit (RISC 프로세서 제어부의 설계)

  • 홍인식;임인칠
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.7
    • /
    • pp.1005-1014
    • /
    • 1990
  • This paper proposes the control unit of a 32-bit high-performance RISC type microprocessor. This control unit controls the whole data path of target processor and on chip instruction/data caches in 4-stage pipelined scheme. For the improvement of speed, large parts of data path and control unit are designed by domino-CMOS and hard-wired circuit technology. First, in this paper, target processor's instruction set and data path are defined, and next, all signals needed to control the data path are analyzed. The decoder of control unit and clock generated logic block are implemented in DCAL(Dynamic CMOS Array Logic) with modified clock scheme for the purpose of speed up and supporting RISC processor's pipelined architecture efficiently.

  • PDF

8.3 Gbps pipelined LEA Crypto-Processor Supporting ECB/CTR Modes of operation (ECB/CTR 운영모드를 지원하는 8.3 Gbps 파이프라인 LEA 암호/복호 프로세서)

  • Sung, Mi-Ji;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.12
    • /
    • pp.2333-2340
    • /
    • 2016
  • A LEA (Lightweight Encryption Algorithm) crypto-processor was designed, which supports three master key lengths of 128/ 192/256-bit, ECB and CTR modes of operation. To achieve high throughput rate, the round transformation block was designed with 128 bits datapath and a pipelined structure of 16 stages. Encryption/decryption is carried out through 12/14/16 pipelined stages according to the master key length, and each pipelined stage performs round transformation twice. The key scheduler block was optimized to share hardware resources that are required for encryption, decryption, and three master key lengths. The round keys generated by key scheduler are stored in 32 round key registers, and are repeatedly used in round transformation until master key is updated. The pipelined LEA processor was verified by FPGA implementation, and the estimated performance is about 8.3 Gbps at the maximum clock frequency of 130 MHz.

Low Cost Hardware Engine of Atomic Pipeline Broadcast Based on Processing Node Status (프로세서 노드 상황을 고려하는 저비용 파이프라인 브로드캐스트 하드웨어 엔진)

  • Park, Jongsu
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.8
    • /
    • pp.1109-1112
    • /
    • 2020
  • This paper presents a low cost hardware message passing engine of enhanced atomic pipelined broadcast based on processing node status. In this algorithm, the previous atomic pipelined broadcast algorithm is modified to reduce the waiting time until next broadcast communication. For this, the processor change the transmission order of processing nodes based on the nodes' communication channel. Also, the hardware message passing engine architecture of the proposed algorithm is modified to be adopted to multi-core processor. The synthesized logic area of the proposed hardware message passing engine was reduced by about 16%, compared by the pre-existing hardware message passing engine.