• Title/Summary/Keyword: Verilog HDL

Search Result 416, Processing Time 0.027 seconds

Hardware Design and Implementation of a Parallel Processor for High-Performance Multimedia Processing (고성능 멀티미디어 처리용 병렬프로세서 하드웨어 설계 및 구현)

  • Kim, Yong-Min;Hwang, Chul-Hee;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.5
    • /
    • pp.1-11
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements (PEs) and operates on a 3-stage pipelining. Experimental results indicated that the proposed parallel processor outperforms conventional parallel processors in terms of performance. In addition, our proposed parallel processor outperforms commercial high-performance TI C6416 DSP in terms of performance (1.4-31.4x better) and energy efficiency (5.9-8.1x better) with same 130nm technology and 720 clock frequency. The proposed parallel processor was developed with verilog HDL and verified with a FPGA prototype system.

AES-128/192/256 Rijndael Cryptoprocessor with On-the-fly Key Scheduler (On-the-fly 키 스케줄러를 갖는 AED-128/192/256 Rijndael 암호 프로세서)

  • Ahn, Ha-Kee;Shin, Kyung-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.11
    • /
    • pp.33-43
    • /
    • 2002
  • This paper describes a design of cryptographic processor that implements the AES (Advanced Encryption Standard) block cipher algorithm "Rijndael". To achieve high throughput rate, a sub-pipeline stage is inserted into a round transformation block, resulting that two consecutive round functions are simultaneously operated. For area-efficient and low-power implementation, the round transformation block is designed to share the hardware resources for encryption and decryption. An efficient on-the-fly key scheduler is devised to supports the three master-key lengths of 128-b/192-b/256-b, and it generates round keys in the first sub-pipeline stage of each round processing. The Verilog-HDL model of the cryptoprocessor was verified using Xilinx FPGA board and test system. The core synthesized using 0.35-${\mu}m$ CMOS cell library consists of about 25,000 gates. Simulation results show that it has a throughput of about 520-Mbits/sec with 220-MHz clock frequency at 2.5-V supply.

Design of HEVC Motion Estimation Engine with Search Window Data Reuse and Early Termination (탐색 영역 데이터의 재사용 및 조기중단이 가능한 HEVC 움직임 추정 엔진 설계)

  • Hur, Ahrum;Park, Taewook;Lee, Seongsoo
    • Journal of IKEEE
    • /
    • v.20 no.3
    • /
    • pp.273-278
    • /
    • 2016
  • In HEVC variable block size motion estimation, same search window data are duplicatedly used in each block size. It increases memory bandwidth, and it is difficult to exploit early termination. In this paper, largest block size and its corresponding smaller block sizes with same positions are performed at the same time. It reduces memory bandwidth and computation by reusing search window data and computation results. In the early termination, image quality can be degraded when it determines early termination by observing largest block size only, since smaller block sizes cannot be equally terminated due to their relative positions. So, in this paper, processing order of early termination is changed to perform smaller block sizes in turns. The designed motion estimation engine was described in Verilog HDL and it was synthesized and verified in 0.18um process technology. Its gate count and maximum operating frequency are 36,101 gates and 263.15 MHz, respectively.

A module generator for variable-precision multiplier core with error compensation for low-power DSP applications (저전력 DSP 응용을 위한 오차보상을 갖는 가변 정밀도 승산기 코어 생성기)

  • Hwang, Seok-Ki;Lee, Jin-Woo;Shin, Kyung-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.2A
    • /
    • pp.129-136
    • /
    • 2005
  • A multiplier generator, VPM_Gen (Variable-Precision Multiplier Generator), which generates Verilog-HDL models of multiplier cores with user-defined bit-width specification, is described. The bit-widths of operands are parameterized in the range of $8-bit{\sim}32-bit$ with 1-bit step, and the product from multiplier core can be truncated in the range of $8-bit{\sim}64-bit$ with 2-bit step, resulting that the VPM_Gen can generate 3,455 multiplier cores. In the case of truncating multiplier output, by eliminating the circuits corresponding to the truncation part, the gate counts and power dissipation can be reduced by about 40% and 30%, respectively, compared with full-precision multiplier. As a result, an area-efficient and low-power multiplier core can be obtained. To minimize truncation error, an adaptive error-compensation method considering the number of truncation bits is employed. The multiplier cores generated by VPM_Gen have been verified using Xilinx FFGA board and logic analyzer.

Low Power High Frequency Design for Data Transfer for RISC and CISC Architecture (RISC와 CISC 구조를 위한 저전력 고속 데이어 전송)

  • Agarwal Ankur;Pandya A. S.;Lho Young-Uhg
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.2
    • /
    • pp.321-327
    • /
    • 2006
  • This paper presents low power and high frequency design of instructions using ad-hoc techniques at transistor level for full custom and semi-custom ASIC(Application Specific Integrated Circuit) designs. The proposed design has been verified at high level using Verilog-HDL and simulated using ModelSim for the logical correctness. It is then observed at the layout level using LASI using $0.25{\mu}m$ technology and analyzed for timing characteristic under Win-spice simulation environment. The result shows the significant reduction up to $35\%$ in the power consumption by any general purpose processor like RISC or CISC. A significant reduction in the propagation delay is also observed. increasing the frequency for the fetch and execute cycle for the CPU, thus increasing the overall frequency of operation.

Hardware Design of AES Cryptography Module Operating as Coprocessor of Core-A Microprocessor (Core-A 마이크로프로세서의 코프로세서로 동작하는 AES 암호모듈의 하드웨어 설계)

  • Ha, Chang-Soo;Choi, Byeong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.12
    • /
    • pp.2569-2578
    • /
    • 2009
  • Core-A microprocessor is the all-Korean product designed as 32-bit embedded RISC microprocessor developed by KAIST and supported by the Industrial Property Office. This paper analyze Core-A microprocessor architecture and proposes efficient method to interface Core-A microprocessor with coprocessor. To verify proposed interfacing method, the AES cryptography processor that has 128-bit key and block size is used as a coprocessor. Coprocessor and AES are written in Verilog-HDL and verified using Modelsim simulator. It except AES module consists of about 3,743 gates and its maximum operating frequency is about 90Mhz under 0.35um CMOS technology. The proposed coprocessor interface architecture is efficiency to send data or to receive data from Core-A to coprocessor.

Conversion Method of 3D Point Cloud to Depth Image and Its Hardware Implementation (3차원 점군데이터의 깊이 영상 변환 방법 및 하드웨어 구현)

  • Jang, Kyounghoon;Jo, Gippeum;Kim, Geun-Jun;Kang, Bongsoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.10
    • /
    • pp.2443-2450
    • /
    • 2014
  • In the motion recognition system using depth image, the depth image is converted to the real world formed 3D point cloud data for efficient algorithm apply. And then, output depth image is converted by the projective world after algorithm apply. However, when coordinate conversion, rounding error and data loss by applied algorithm are occurred. In this paper, when convert 3D point cloud data to depth image, we proposed efficient conversion method and its hardware implementation without rounding error and data loss according image size change. The proposed system make progress using the OpenCV and the window program, and we test a system using the Kinect in real time. In addition, designed using Verilog-HDL and verified through the Zynq-7000 FPGA Board of Xilinx.

Hardware Design of Efficient SAO for High Performance In-loop filters (고성능 루프내 필터를 위한 효율적인 SAO 하드웨어 설계)

  • Park, Seungyong;Ryoo, Kwangki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.543-545
    • /
    • 2017
  • This paper describes the SAO hardware architecture design for high performance in-loop filters. SAO is an inner module of in-loop filter, which compensates for information loss caused by block-based image compression and quantization. However, HEVC's SAO requires a high computation time because it performs pixel-unit operations. Therefore, the SAO hardware architecture proposed in this paper is based on a $4{\times}4$ block operation and a 2-stage pipeline structure for high-speed operation. The information generation and offset computation structure for SAO computation is designed in a parallel structure to minimize computation time. The proposed hardware architecture was designed with Verilog HDL and synthesized with TSMC chip process 130nm and 65nm cell library. The proposed hardware design achieved a maximum frequency of 476MHz yielding 163k gates and 312.5MHz yielding 193.6k gates on the 130nm and 65nm processes respectively.

  • PDF

Lightweight Hardware Design of Elliptic Curve Diffie-Hellman Key Generator for IoT Devices (사물인터넷 기기를 위한 경량 Elliptic Curve Diffie-Hellman 키 생성기 하드웨어 설계)

  • Kanda, Guard;Ryoo, Kwangki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.581-583
    • /
    • 2017
  • Elliptic curve cyptography is relatively a current cryptography based on point arithmetic on elliptic curves and the Elliptic Curve Discrete Logarithm Problem (ECDLP). This discrete logarithm problems enables perfect forward secrecy which helps to easily generate key and almost impossible to revert the generation which is a great feature for privacy and protection. In this paper, we provide a lightweight Elliptic Curve Diffie-Hellman (ECDH) Key exchange generator that creates a 163 bit long shared key that can be used in an Elliptic Curve Integrated Encryption Scheme (ECIES) as well as for key agreement. The algorithm uses a fast multiplication algorithm that is small in size and also implements the extended euclidean algorithm. This proposed architecture was designed using verilog HDL, synthesized with the vivado ISE 2016.3 and was implemented on the virtex-7 FPGA board.

  • PDF

Low-area FFT Processor Structure using Common Sub-expression Sharing (Common Sub-expression Sharing을 사용한 저면적 FFT 프로세서 구조)

  • Jang, Young-Beom;Lee, Dong-Hoon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.4
    • /
    • pp.1867-1875
    • /
    • 2011
  • In this paper, a low-area 256-point FFT structure is proposed. For low-area implementation CSD(Canonic Signed Digit) multiplier method is chosen. Because multiplication type should be less for efficient CSD multiplier application to the FFT structure, the Radix-$4^2$ algorithm is chosen for those purposes. After, in the proposed structure, the number of multiplication type is minimized in each multiplication block, the CSD multipliers are applied for implementation of multiplication. Furthermore, in CSD multiplier implementation, cell-area is more reduced through common sub-expression sharing(CSS). The Verilog-HDL coding result shows 29.9% cell area reduction in the complex multiplication part and 12.54% cell area reduction in overall 256-point FFT structure comparison with those of the conventional structure.