• Title/Summary/Keyword: FPGA 가속기

Search Result 60, Processing Time 0.023 seconds

Radix-2 16 Points FFT Algorithm Accelerator Implementation Using FPGA (FPGA를 사용한 radix-2 16 points FFT 알고리즘 가속기 구현)

  • Gyu Sup Lee;Seong-Min Cho;Seung-Hyun Seo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.1
    • /
    • pp.11-19
    • /
    • 2024
  • The increased utilization of the FFT in signal processing, cryptography, and various other fields has highlighted the importance of optimization. In this paper, we propose the implementation of an accelerator that processes the radix-2 16 points FFT algorithm more rapidly and efficiently than FFT implementation of existing studies, using FPGA(Field Programmable Gate Array) hardware. Leveraging the hardware advantages of FPGA, such as parallel processing and pipelining, we design and implement the FFT logic in the PL (Programmable Logic) part using the Verilog language. We implement the FFT using only the Zynq processor in the PS (Processing System) part, and compare the computation times of the implementation in the PL and PS part. Additionally, we demonstrate the efficiency of our implementation in terms of computation time and resource usage, in comparison with related works.

A Study on Real Time LDWS using FPGA Accelerator (Lane Detection Warning System) (FPGA 가속기를 활용한 실시간 차선 유지 시스템 개발에 관한 연구)

  • Chae-won Lee;Min-Ha Kim;Ji-Yun Han;Su-Been Hong;Soo-Kyung Shin
    • Annual Conference of KIPS
    • /
    • 2024.10a
    • /
    • pp.855-856
    • /
    • 2024
  • 본 연구는 FPGA 가속기를 활용하여 실시간으로 차선을 검출하고, 이를 유지하는 시스템을 개발한다. 차선 검출에는 Sobel Filter 와 Hough 변환을 이용하며 실시간을 위한 데이터 처리 속도 개선에는 FPGA 의 PL Logic 과 메모리 최적화 기법을 사용한다. 이로써 설치가 용이한 부착형 방식의 LDWS 를 통해 낮은 수준의 자율 주행을 가능케 한다.

Deep Learning-based Real-Time Super-Resolution Architecture Design (경량화된 딥러닝 구조를 이용한 실시간 초고해상도 영상 생성 기술)

  • Ahn, Saehyun;Kang, Suk-Ju
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2020.11a
    • /
    • pp.228-229
    • /
    • 2020
  • 최근 딥러닝 기술은 여러 컴퓨터 비전 응용 분야에서 많이 쓰이고 있다. 물체 인식, 분류 및 영상 생성 등을 예로 들 수 있다. 특히 초고해상도 변환 문제에서 최근 딥러닝을 사용하면서 큰 성능 개선을 얻고 있다. Fast super-resolution convolutional neural network (FSRCNN)은 딥러닝 기반 초고해상도 알고리즘으로 잘 알려져 있으며, 여러 개의 convolutional layer로 추출한 저 해상도의 입력 특징을 활용하여 deconvolutional layer에서 초고해상도의 영상을 출력하는 알고리즘이다. 본 논문에서는 병렬 연산 효율성을 고려한 FPGA 기반 convolutional neural networks 가속기를 제안한다. 특히 deconvolutional layer를 convolutional layer로 변환하는 방법을 통해서 에너지 효율적인 가속기를 설계했다. 또한 제안한 방법은 FPGA 리소스를 고려하여 FSRCNN의 구조를 변형한 Optimal-FSRCNN을 제안한다. 사용하는 곱셈기의 개수를 FSRCNN 대비 2.4 배 압축하였고, 초고해상도 변환 성능을 평가하는 지표인 PSNR은 FSRCNN과 비슷한 성능을 내고 있다. 이를 통해서 FPGA 에 최적화된 네트워크를 구현하여 FHD 입력 영상을 UHD 영상으로 출력하는 실시간 영상처리 기술을 개발했다.

  • PDF

An Embedded FAST Hardware Accelerator for Image Feature Detection (영상 특징 추출을 위한 내장형 FAST 하드웨어 가속기)

  • Kim, Taek-Kyu
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.2
    • /
    • pp.28-34
    • /
    • 2012
  • Various feature extraction algorithms are widely applied to real-time image processing applications for extracting significant features from images. Feature extraction algorithms are mostly combined with image processing algorithms mostly for image tracking and recognition. Feature extraction function is used to supply feature information to the other image processing algorithms and it is mainly implemented in a preprocessing stage. Nowadays, image processing applications are faced with embedded system implementation for a real-time processing. In order to satisfy this requirement, it is necessary to reduce execution time so as to improve the performance. Reducing the time for executing a feature extraction function dose not only extend the execution time for the other image processing algorithms, but it also helps satisfy a real-time requirement. This paper explains FAST (Feature from Accelerated Segment Test algorithm) of E. Rosten and presents FPGA-based embedded hardware accelerator architecture. The proposed acceleration scheme can be implemented by using approximately 2,217 Flip Flops, 5,034 LUTs, 2,833 Slices, and 18 Block RAMs in the Xilinx Vertex IV FPGA. In the Modelsim - based simulation result, the proposed hardware accelerator takes 3.06 ms to extract 954 features from a image with $640{\times}480$ pixels and this result shows the cost effectiveness of the propose scheme.

A Lightweight Hardware Accelerator for Public-Key Cryptography (공개키 암호 구현을 위한 경량 하드웨어 가속기)

  • Sung, Byung-Yoon;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.12
    • /
    • pp.1609-1617
    • /
    • 2019
  • Described in this paper is a design of hardware accelerator for implementing public-key cryptographic protocols (PKCPs) based on Elliptic Curve Cryptography (ECC) and RSA. It supports five elliptic curves (ECs) over GF(p) and three key lengths of RSA that are defined by NIST standard. It was designed to support four point operations over ECs and six modular arithmetic operations, making it suitable for hardware implementation of ECC- and RSA-based PKCPs. In order to achieve small-area implementation, a finite field arithmetic circuit was designed with 32-bit data-path, and it adopted word-based Montgomery multiplication algorithm, the Jacobian coordinate system for EC point operations, and the Fermat's little theorem for modular multiplicative inverse. The hardware operation was verified with FPGA device by implementing EC-DH key exchange protocol and RSA operations. It occupied 20,800 gate equivalents and 28 kbits of RAM at 50 MHz clock frequency with 180-nm CMOS cell library, and 1,503 slices and 2 BRAMs in Virtex-5 FPGA device.

FPGA based Implementation of FAST and BRIEF algorithm for Object Recognition (객체인식을 위한 FAST와 BRIEF 알고리즘 기반 FPGA 설계)

  • Heo, Hoon;Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.17 no.2
    • /
    • pp.202-207
    • /
    • 2013
  • This paper implemented the conventional FAST and BRIEF algorithm as hardware on Zynq-7000 SoC Platform. Previous feature-based hardware accelerator is mostly implemented using the SIFT or SURF algorithm, but it requires excessive internal memory and hardware cost. The proposed FAST & BRIEF accelerator reduces approximately 57% of internal memory usage and 70% of hardware cost compared to the conventional SIFT or SURF accelerator, and it processes 0.17 pixel per Clock.

Design of FPGA Hardware Accelerator for Information Security System (정보보호 시스템을 위한 FPGA 기반 하드웨어 가속기 설계)

  • Cha, Jeong Woo;Kim, Chang Hoon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.18 no.2
    • /
    • pp.1-12
    • /
    • 2013
  • Information Security System is implemented in software, hardware and FPGA device. Implementation of S/W provides high flexibility about various information security algorithm, but it has very vulnerable aspect of speed, power, safety, and performing ASIC is really excellent aspect of speed and power but don't support various security platform because of feature's realization. To improve conflict of these problems, implementation of recent FPGA device is really performed. The goal of this thesis is to design and develop a FPGA hardware accelerator for information security system. It performs as AES, SHA-256 and ECC and is controlled by the Integrated Interface. Furthermore, since the proposed Security Information System can satisfy various requirements and some constraints, it can be applied to numerous information security applications from low-cost applications and high-speed communication systems.

Deep Learning-based Real-Time Super-Resolution Architecture Design (경량화된 딥러닝 구조를 이용한 실시간 초고해상도 영상 생성 기술)

  • Ahn, Saehyun;Kang, Suk-Ju
    • Journal of Broadcast Engineering
    • /
    • v.26 no.2
    • /
    • pp.167-174
    • /
    • 2021
  • Recently, deep learning technology is widely used in various computer vision applications, such as object recognition, classification, and image generation. In particular, the deep learning-based super-resolution has been gaining significant performance improvement. Fast super-resolution convolutional neural network (FSRCNN) is a well-known model as a deep learning-based super-resolution algorithm that output image is generated by a deconvolutional layer. In this paper, we propose an FPGA-based convolutional neural networks accelerator that considers parallel computing efficiency. In addition, the proposed method proposes Optimal-FSRCNN, which is modified the structure of FSRCNN. The number of multipliers is compressed by 3.47 times compared to FSRCNN. Moreover, PSNR has similar performance to FSRCNN. We developed a real-time image processing technology that implements on FPGA.

A Threshold Controller for FAST Hardware Accelerator (FAST 하드웨어 가속기를 위한 임계값 제어기)

  • Kim, Taek-Kyu;Suh, Yong-Suk
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.11
    • /
    • pp.187-192
    • /
    • 2014
  • Various researches are performed to extract significant features from continuous images. The FAST algorithm has the simple structure for arithmetic operation and it is easy to extraction the features in real time. For this reason, the FPGA based hardware accelerator is implemented and widely applied for the FAST algorithm. The hardware accelerator needs the threshold to extract the features from images. The threshold is influenced not only the number of extracted features but also the total execution time. Therefore, the way of threshold control is important to stabilize the total execution time and to extract features as much as possible. In order to control the threshold, this paper proposes the PI controller. The function and performance for the proposed PI controller are verified by using test images and the PI control logic is designed based on Xilinx Vertex IV FPGA. The proposed scheme can be implemented by adding 47 Flip Flops, 146 LUTs, and 91 Slices to the FAST hardware accelerator. This proposed approach only occupies 2.1% of Flip Flop, 4.4% of LUTs, and 4.5% of Slices and can be regarded as a small portion of hardware cost.

Effective Application of PYNQ for FPGA-Based AI Acceleration: A Comparative Research with Petalinux (FPGA 기반 AI 가속에서 PYNQ의 효과적인 활용: Petalinux와의 비교)

  • Yu-min Kang;Han-yul Min;Chae-bin Lee
    • Annual Conference of KIPS
    • /
    • 2024.10a
    • /
    • pp.936-937
    • /
    • 2024
  • 본 논문은 FPGA 기반의 Petalinux SDK와 PYNQ 프레임워크의 이미지 처리 속도를 비교한다. 연구에서는 YOLO v3 Tiny와 Darknet-19 알고리즘을 사용하여 FPGA에서 자체 제작한 CNN 가속기로 실험을 진행하였다. Petalinux SDK는 이미지 처리에 약 233.13ms가 소요된 반면, PYNQ 프레임워크는 약 2.55ms가 소요되어 더 빠른 속도를 보였다. 이를 통해 PYNQ의 잠재력과 활용 가능성을 강조하며, 추가 연구의 필요성을 제기한다.