• Title/Summary/Keyword: FPGA 가속기

Search Result 60, Processing Time 0.023 seconds

Design of Open Vector Graphics Accelerator for Mobile Vector Graphics (모바일 벡터 그래픽을 위한 OpenVG 가속기 설계)

  • Kim, Young-Ouk;Roh, Young-Sup
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.10
    • /
    • pp.1460-1470
    • /
    • 2008
  • As the performance of recent mobile systems increases, a vector graphic has been implemented to represent various types of dynamic menus, mails, and two-dimensional maps. This paper proposes a hardware accelerator for open vector graphics (OpenVG), which is widely used for two-dimensional vector graphics. We analyze the specifications of an OpenVG and divide the OpenVG into several functions suitable for hardware implementation. The proposed hardware accelerator is implemented on a field programmable gate array (FPGA) board using hardware description language (HDL) and is about four times faster than an Alex processor.

  • PDF

Study on Multiple sparse matrix-matrix multiplication hardware accelerator (다중 희소 행렬-행렬 곱셈 하드웨어 가속기 연구)

  • Tae-Hyoung Kim;Yeong-Pil Cho
    • Annual Conference of KIPS
    • /
    • 2024.05a
    • /
    • pp.47-50
    • /
    • 2024
  • 희소 행렬은 대부분의 요소가 0 인 행렬이다. 이러한 희소 행렬-행렬 곱셈을 수행할 경우 0 인 데이터 또한 곱셈을 수행하니 불필요한 연산이 발생한다. 이러한 문제를 해결하고자 행렬 압축 알고리즘 또는 곱셈의 부분합의 수를 줄이는 연구들이 활발히 진행 중이다. 하지만 현재의 연구들은 주로 단일 행렬 연산에 집중되어 있어 FPGA(Field Programmable Gate Array)와 특정 용도로 사용하는 가속기에서는 리소스를 충분히 활용하지 못해 비효율적이다. 본 연구는 FPGA 의 모든 리소스를 사용하여 다중 희소 행렬 곱셈을 수행하는 아키텍처를 제안한다.

Design of Multipliers Optimized for CNN Inference Accelerators (CNN 추론 연산 가속기를 위한 곱셈기 최적화 설계)

  • Lee, Jae-Woo;Lee, Jaesung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1403-1408
    • /
    • 2021
  • Recently, FPGA-based AI processors are being studied actively. Deep convolutional neural networks (CNN) are basic computational structures performed by AI processors and require a very large amount of multiplication. Considering that the multiplication coefficients used in CNN inference operation are all constants and that an FPGA is easy to design a multiplier tailored to a specific coefficient, this paper proposes a methodology to optimize the multiplier. The method utilizes 2's complement and distributive law to minimize the number of bits with a value of 1 in a multiplication coefficient, and thereby reduces the number of required stacked adders. As a result of applying this method to the actual example of implementing CNN in FPGA, the logic usage is reduced by up to 30.2% and the propagation delay is also reduced by up to 22%. Even when implemented with an ASIC chip, the hardware area is reduced by up to 35% and the delay is reduced by up to 19.2%.

FPGA-Based Acceleration of Range Doppler Algorithm for Real-Time Synthetic Aperture Radar Imaging (실시간 SAR 영상 생성을 위한 Range Doppler 알고리즘의 FPGA 기반 가속화)

  • Jeong, Dongmin;Lee, Wookyung;Jung, Yunho
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.634-643
    • /
    • 2021
  • In this paper, an FPGA-based acceleration scheme of range Doppler algorithm (RDA) is proposed for the real time synthetic aperture radar (SAR) imaging. Hardware architectures of matched filter based on systolic array architecture and a high speed sinc interpolator to compensate range cell migration (RCM) are presented. In addition, the proposed hardware was implemented and accelerated on Xilinx Alveo FPGA. Experimental results for 4096×4096-size SAR imaging showed that FPGA-based implementation achieves 2 times acceleration compared to GPU-based design. It was also confirmed the proposed design can be implemented with 60,247 CLB LUTs, 103,728 CLB registers, 20 block RAM tiles and 592 DPSs at the operating frequency of 312 MHz.

Cascade CNN with CPU-FPGA Architecture for Real-time Face Detection (실시간 얼굴 검출을 위한 Cascade CNN의 CPU-FPGA 구조 연구)

  • Nam, Kwang-Min;Jeong, Yong-Jin
    • Journal of IKEEE
    • /
    • v.21 no.4
    • /
    • pp.388-396
    • /
    • 2017
  • Since there are many variables such as various poses, illuminations and occlusions in a face detection problem, a high performance detection system is required. Although CNN is excellent in image classification, CNN operatioin requires high-performance hardware resources. But low cost low power environments are essential for small and mobile systems. So in this paper, the CPU-FPGA integrated system is designed based on 3-stage cascade CNN architecture using small size FPGA. Adaptive Region of Interest (ROI) is applied to reduce the number of CNN operations using face information of the previous frame. We use a Field Programmable Gate Array(FPGA) to accelerate the CNN computations. The accelerator reads multiple featuremap at once on the FPGA and performs a Multiply-Accumulate (MAC) operation in parallel for convolution operation. The system is implemented on Altera Cyclone V FPGA in which ARM Cortex A-9 and on-chip SRAM are embedded. The system runs at 30FPS with HD resolution input images. The CPU-FPGA integrated system showed 8.5 times of the power efficiency compared to systems using CPU only.

VLSI Architecture of General-purpose Memory Controller for Multiple Processing (다수의 프로세싱 유닛 처리를 위한 범용 메모리 제어기의 구조)

  • Lee, Yoon-Hyuk;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.12
    • /
    • pp.2632-2640
    • /
    • 2011
  • In this paper, we implemented a memory controller which can accommodate data processing blocks. The memory controller is arbitrated by the internal arbiter which receives request signals from masters and sends grant and data signals to masters. The designed memory controller consists of Master Interface, Master Arbitrator, Memory Interface, Memory accelerator. It was designed using VHDL, and verified using the memory model of SAMSING Inc. For FPGA synthesis and verification, Quartus II of ATERA Inc. was used. The target device is Cyclone II. For simulation, ModelSim of Cadence Inc was used.

Real-time video data encryption system using FPGA-based crypto-accelerator in the Internet of Things environment (사물인터넷 환경에서 하드웨어(FPGA)기반 암호가속기 사용 실시간 영상 데이터 암호화 시스템)

  • Kim, Min-Jae;Lee, Jun-Ho;Kim, Ho-Won
    • Annual Conference of KIPS
    • /
    • 2022.05a
    • /
    • pp.15-17
    • /
    • 2022
  • 사물인터넷 기술이 활성화되면서 원격 접속 및 제어가 가능한 스마트 가전기기의 보급이 증가하고 있다. 이에 따라 스마트 가전 기기의 보안취약점을 이용하여 개인정보 유출, 프라이버시 침해 등 사이버 보안 관련 범죄도 같이 증가하는 추세이다. 최근 저성능 디바이스에서 경량 암호를 이용한 안전성 보장 방안에 대한 연구가 진행 중이나, 저성능 디바이스에서 4K/2160p 이상의 영상 데이터를 실시간으로 암·복호화하는 것은 높은 지연시간을 발생시킨다. 본 연구에서는 하드웨어 기반 암호 알고리즘 가속기를 이용하여 저성능 디바이스에서도 구현 가능한 대용량 영상데이터 실시간 암·복호화 시스템을 제안한다.

Radix-2 16-points FFT accelerator implementation using FPGA (FPGA 를 사용한 radix-2 16-points FFT 알고리즘 가속기 구현)

  • Gyu Sup Lee;Seong-Min Cho;Seung-Hyun Seo
    • Annual Conference of KIPS
    • /
    • 2023.05a
    • /
    • pp.23-25
    • /
    • 2023
  • 본 논문에서는 FPGA 를 활용하여 radix-2 Fast Fourier Transform(FFT) 알고리즘을 빠르고 효율적으로 구현하는 연구에 대해 기술한다. 본 논문에서 zybo z7-20 FPGA 를 사용하여 Processing System(PS)에서만 동작하는 구현과 Programmable Logic(PL)에서 동작하며 파이프라인과 병렬처리를 사용한 FFT 구현 결과를 비교한다. 또한 유사한 논문과의 결과 비교를 통해 본 구현 방법의 연산 시간 및 리소스 사용의 효율성을 분석한다.

Design and Implementation of BNN-based Gait Pattern Analysis System Using IMU Sensor (관성 측정 센서를 활용한 이진 신경망 기반 걸음걸이 패턴 분석 시스템 설계 및 구현)

  • Na, Jinho;Ji, Gisan;Jung, Yunho
    • Journal of Advanced Navigation Technology
    • /
    • v.26 no.5
    • /
    • pp.365-372
    • /
    • 2022
  • Compared to sensors mainly used in human activity recognition (HAR) systems, inertial measurement unit (IMU) sensors are small and light, so can achieve lightweight system at low cost. Therefore, in this paper, we propose a binary neural network (BNN) based gait pattern analysis system using IMU sensor, and present the design and implementation results of an FPGA-based accelerator for computational acceleration. Six signals for gait are measured through IMU sensor, and a spectrogram is extracted using a short-time Fourier transform. In order to have a lightweight system with high accuracy, a BNN-based structure was used for gait pattern classification. It is designed as a hardware accelerator structure using FPGA for computation acceleration of binary neural network. The proposed gait pattern analysis system was implemented using 24,158 logics, 14,669 registers, and 13.687 KB of block memory, and it was confirmed that the operation was completed within 1.5 ms at the maximum operating frequency of 62.35 MHz and real-time operation was possible.

An Efficient FPGA Based TDC Accelerator for Deconvolutional Neural Networks (효율적인 DCNN 연산을 위한 FPGA 기반 TDC 가속기)

  • Jang, Hyerim;Moon, Byungin
    • Annual Conference of KIPS
    • /
    • 2021.05a
    • /
    • pp.457-458
    • /
    • 2021
  • 딥러닝 알고리즘 중 DCNN(DeConvolutional Neural Network)은 이미지 업스케일링과 생성·복원 등 다양한 분야에서 뛰어난 성능을 보여주고 있다. DCNN은 많은 양의 데이터를 병렬로 처리할 수 있기 때문에 하드웨어로 설계하는 것이 유용하다. 최근 DCNN의 하드웨어 구조 연구에서는 overlapping sum 문제를 해결하기 위해 deconvolution 필터를 convolution 필터로 변환하는 TDC(Transforming the Deconvolutional layer into the Convolutional layer) 알고리즘이 제안되었다. 하지만 TDC를 CPU(Central Processing Unit)로 수행하기 때문에 연산의 최적화가 어려우며, 외부 메모리를 사용하기에 추가적인 전력이 소모된다. 이에 본 논문에서는 저전력으로 구동할 수 있는 FPGA 기반 TDC 하드웨어 구조를 제안한다. 제안하는 하드웨어 구조는 자원 사용량이 적어 저전력으로 구동 가능할 뿐만 아니라, 병렬 처리 구조로 설계되어 빠른 연산 처리 속도를 보인다.