• Title/Summary/Keyword: FPGA 가속기

Search Result 60, Processing Time 0.025 seconds

Design and Implementation of Radar Signal Processing System for Vehicle Door Collision Prevention (차량 도어 충돌 방지용 레이다 신호처리 시스템 설계 및 구현)

  • Jeongwoo Han;Minsang Kim;Daehong Kim;Yunho Jung
    • Journal of IKEEE
    • /
    • v.28 no.3
    • /
    • pp.397-404
    • /
    • 2024
  • This paper presents the design and implementation results of a Raspberry-Pi-based embedded system with an FPGA accelerator that can detect and classify objects using an FMCW radar sensor for preventing door collision accidents in vehicles. The proposed system performs a radar sensor signal processing and a deep learning processing that classifies objects into bicycles, automobiles, and pedestrians. Since the CNN algorithm requires substantial computation and memory, it is not suitable for embedded systems. To address this, we implemented a lightweight deep learning model, BNN, optimized for embedded systems on an FPGA, and verified the results achieving a classification accuracy of 90.33% and an execution time of 20ms.

Microcode based Controller for Compact CNN Accelerators Aimed at Mobile Devices (모바일 디바이스를 위한 소형 CNN 가속기의 마이크로코드 기반 컨트롤러)

  • Na, Yong-Seok;Son, Hyun-Wook;Kim, Hyung-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.3
    • /
    • pp.355-366
    • /
    • 2022
  • This paper proposes a microcode-based neural network accelerator controller for artificial intelligence accelerators that can be reconstructed using a programmable architecture and provide the advantages of low-power and ultra-small chip size. In order for the target accelerator to support various neural network models, the neural network model can be converted into microcode through microcode compiler and mounted on accelerator to control the operators of the accelerator such as datapath and memory access. While the proposed controller and accelerator can run various CNN models, in this paper, we tested them using the YOLOv2-Tiny CNN model. Using a system clock of 200 MHz, the Controller and accelerator achieved an inference time of 137.9 ms/image for VOC 2012 dataset to detect object, 99.5ms/image for mask detection dataset to detect wearing mask. When implementing an accelerator equipped with the proposed controller as a silicon chip, the gate count is 618,388, which corresponds to 65.5% reduction in chip area compared with an accelerator employing a CPU-based controller (RISC-V).

A 3D graphic pipelines with an efficient clipping algorithm (효율적인 클리핑 기능을 갖는 3차원 그래픽 파이프라인 구조)

  • Lee, Chan-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.8
    • /
    • pp.61-66
    • /
    • 2008
  • Recently, portable devices which require small area and low power consumption employ applications using 3D graphics such as 3D games and 3D graphical user interfaces. We propose an efficient clipping engine algorithm which is suitable in 3D graphics pipeline. The clipping operation is divided into two steps: one is the selection process in the transformation engine and the other is the pixel clipping process in the scan conversion unit. The clipping operation is possible with addition of simple comparator. The clipping for the Y-axis is achieved in the edge walk stage and that for the X and Z-axis is performed in the span processing. The proposed clipping algorithm reduces the operation cycles and the area of of 3D graphics pipelines. We designed a 3D graphics pipeline with the proposed clipping algorithm using Verilog-HDL and verifies the operation using an FPGA.

Lightweight FPGA Implementation of Symmetric Buffer-based Active Noise Canceller with On-Chip Convolution Acceleration Units (온칩 컨볼루션 가속기를 포함한 대칭적 버퍼 기반 액티브 노이즈 캔슬러의 경량화된 FPGA 구현)

  • Park, Seunghyun;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1713-1719
    • /
    • 2022
  • As the noise canceler with a small processing delay increases the sampling frequency, a better-quality output can be obtained. For a single buffer, processing delay occurs because it is impossible to write new data while the processor is processing the data. When synthesizing with anti-noise and output signal, this processing delay creates additional buffering overhead to match the phase. In this paper, we propose an accelerator structure that minimizes processing delay and increases processing speed by alternately performing read and write operations using the Symmetric Even-Odd-buffer. In addition, we compare the structural differences between the two methods of noise cancellation (Fast Fourier Transform noise cancellation and adaptive Least Mean Square algorithm). As a result, using an Symmetric Even-Odd-buffer the processing delay was reduced by 29.2% compared to a single buffer. The proposed Symmetric Even-Odd-buffer structure has the advantage that it can be applied to various canceling algorithms.

Design of Stand-alone AI Processor for Embedded System (독립운용이 가능한 임베디드 인공지능 프로세서 설계)

  • Cho, Kwon Neung;Choi, Do Young;Jeong, Young Woo;Lee, Seung Eun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.600-602
    • /
    • 2021
  • With the development of the mobile industry and growing interest in artificial intelligence (AI) technology, a lot of research for AI processors which applicable to embedded systems is under study. When implementing AI to embedded systems, the design should be considered the restriction of resource and power consumption. Moreover, it is efficient to include a dedicated hardware accelerator in order to complement the low computational performance of the embedded system. In this paper, we propose an stand-alone embedded AI processor. The proposed AI processor includes a hardware accelerator that is dedicated to the distance-based AI algorithm and a general-purpose MCU that supports flexible programmability for application to various embedded systems. The AI processor was designed with Verilog HDL and verified by implementing on Field Programmable Gate Array (FPGA).

  • PDF

Development of High Stability Magnet Power Supply (고정밀 전자석 전원 장치 개발)

  • Park, Ki-Hyeon;Jeong, Seong-Hun;Kim, Myoung-Ho
    • Proceedings of the KIEE Conference
    • /
    • 2011.07a
    • /
    • pp.1201-1203
    • /
    • 2011
  • 이 논문은 가속기에 사용되는 전자석용 전원장치 개발에 대해서 기술하였다. 이 전원장치는 DSP TMS320F2808과 ADCs, FPGA를 이용한 디지털 신호처리 기술을 적용하여 5ppm 이하의 출력 전류 안정도를 가진다. 여기서는 전원장치의 입력 및 출력 필터, 비례적분 보상기 등에 대한 설계, 전원장치 모의실험 결과 및 제작 후 출력 전류의 안정도, 시스템의 주파수 특성 등 여러 성능의 측정 결과에 대하여 기술하였다.

  • PDF

A design of The Embedded 3n Graphics Rendering Processor for Portable Devices (휴대형기기에 적합한 내장형 3차원 그래픽 렌더링 처리기 설계)

  • 우현재;장태홍;이문기
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.11
    • /
    • pp.105-113
    • /
    • 2004
  • This paper proposes 3D graphics accelerator, especially rendering unit, for portable devices. The existing 3D architecture is not suitable for portable devices because of its huge size. To reduce the size, we use iterative architecture and fixed-point calculation. In this paper, we suggest the format of fixed-point comparing with the result images, and some special technique to control. Finally, it is implemented with FPGA and 0.25um ASIC technology respectively. The ASIC chip can execute 47.88M pixels per second. The size of ASIC chip is 4.9287mm*4.9847mm and the power consumption is 263.7mW with 50MHz operation frequency.

Design and Implementation of a Hardware Accelerator for Marine Object Detection based on a Binary Segmentation Algorithm for Ship Safety Navigation (선박안전 운항을 위한 이진 분할 알고리즘 기반 해상 객체 검출 하드웨어 가속기 설계 및 구현)

  • Lee, Hyo-Chan;Song, Hyun-hak;Lee, Sung-ju;Jeon, Ho-seok;Kim, Hyo-Sung;Im, Tae-ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.10
    • /
    • pp.1331-1340
    • /
    • 2020
  • Object detection in maritime means that the captain detects floating objects that has a risk of colliding with the ship using the computer automatically and as accurately as human eyes. In conventional ships, the presence and distance of objects are determined through radar waves. However, it cannot identify the shape and type. In contrast, with the development of AI, cameras help accurately identify obstacles on the sea route with excellent performance in detecting or recognizing objects. The computer must calculate high-volume pixels to analyze digital images. However, the CPU is specialized for sequential processing; the processing speed is very slow, and smooth service support or security is not guaranteed. Accordingly, this study developed maritime object detection software and implemented it with FPGA to accelerate the processing of large-scale computations. Additionally, the system implementation was improved through embedded boards and FPGA interface, achieving 30 times faster performance than the existing algorithm and a three-times faster entire system.

Implementation of Neural Network Accelerator for Rendering Noise Reduction on OpenCL (OpenCL을 이용한 랜더링 노이즈 제거를 위한 뉴럴 네트워크 가속기 구현)

  • Nam, Kihun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.4 no.4
    • /
    • pp.373-377
    • /
    • 2018
  • In this paper, we propose an implementation of a neural network accelerator for reducing the rendering noise using OpenCL. Among the rendering algorithms, we selects a ray tracing to assure a high quality graphics. Ray tracing rendering uses ray to render, less use of the ray will result in noise. Ray used more will produce a higher quality image but will take operation time longer. To reduce operation time whiles using fewer rays, Learning Base Filtering algorithm using neural network was applied. it's not always produce optimize result. In this paper, a new approach to Matrix Multiplication that is based on General Matrix Multiplication for improved performance. The development environment, we used specialized in high speed parallel processing of OpenCL. The proposed architecture was verified using Kintex UltraScale XKU6909T-2FDFG1157C FPGA board. The time it takes to calculate the parameters is about 1.12 times fast than that of Verilog-HDL structure.

Implementation of a Window-Masking Method and the Soft-core Processor based TDD Switching Control SoC FPGA System (윈도 마스킹 기법과 Soft-core Processor 기반 TDD 스위칭 제어 SoC 시스템 FPGA 구현)

  • Hee-Jin Yang;Jeung-Sub Lee;Han-Sle Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.3
    • /
    • pp.166-175
    • /
    • 2024
  • In this paper, the Window-Masking Method and HAT (Hardware Attached Top) CPU SoM (System on Module) are used to improve the performance and reduce the weight of the MANET (Mobile Ad-hoc Network) network synchronization system using time division redundancy. We propose converting it into a RISC-V based soft-core MCU and mounting it on an FPGA, a hardware accelerator. It was also verified through experiment. In terms of performance, by applying the proposed technique, the synchronization acquisition range is from -50dBm to +10dBm to -60dBm to +10dBm, the lowest input level for synchronization is increased by 20% from -50dBm to -60dBm, and the detection delay (Latency) is 220ns. Reduced by 43% to 125ns. In terms of weight reduction, computing resources (48%), size (33%), and weight (27%) were reduced by an average of 36% by replacing with soft-core MCU.