Search | Korea Science

Toward Optimal FPGA Implementation of Deep Convolutional Neural Networks for Handwritten Hangul Character Recognition

Park, Hanwool;Yoo, Yechan;Park, Yoonjin;Lee, Changdae;Lee, Hakkyung;Kim, Injung;Yi, Kang
- Journal of Computing Science and Engineering
- /
- v.12 no.1
- /
- pp.24-35
- /
- 2018
Deep convolutional neural network (DCNN) is an advanced technology in image recognition. Because of extreme computing resource requirements, DCNN implementation with software alone cannot achieve real-time requirement. Therefore, the need to implement DCNN accelerator hardware is increasing. In this paper, we present a field programmable gate array (FPGA)-based hardware accelerator design of DCNN targeting handwritten Hangul character recognition application. Also, we present design optimization techniques in SDAccel environments for searching the optimal FPGA design space. The techniques we used include memory access optimization and computing unit parallelism, and data conversion. We achieved about 11.19 ms recognition time per character with Xilinx FPGA accelerator. Our design optimization was performed with Xilinx HLS and SDAccel environment targeting Kintex XCKU115 FPGA from Xilinx. Our design outperforms CPU in terms of energy efficiency (the number of samples per unit energy) by 5.88 times, and GPGPU in terms of energy efficiency by 5 times. We expect the research results will be an alternative to GPGPU solution for real-time applications, especially in data centers or server farms where energy consumption is a critical problem.
https://doi.org/10.5626/JCSE.2018.12.1.24 인용

Development of a smart wireless sensing unit using off-the-shelf FPGA hardware and programming products

Kapoor, Chetan;Graves-Abe, Troy L.;Pei, Jin-Song
- Smart Structures and Systems
- /
- v.3 no.1
- /
- pp.69-88
- /
- 2007
In this study, Field-Programmable Gate Arrays (FPGAs) are investigated as a practical solution to the challenge of designing an optimal platform for implementing algorithms in a wireless sensing unit for structuralhealth monitoring. Inherent advantages, such as tremendous processing power, coupled with reconfigurable and flexible architecture render FPGAs a prime candidate for the processing core in an optimal wireless sensor unit, especially when handling Digital Signal Processing (DSP) and system identification algorithms. This paper presents an effort to create a proof-of-concept unit, wherein an off-the-shelf FPGA development board, available at a price comparable to a microprocessor development board, was adopted. Data processing functions, including windowing, Fast Fourier Transform (FFT), and peak detection, were implemented in the FPGA using a Matlab Simulink-based high-level abstraction tool rather than hardware descriptive language. Simulations and laboratory tests were carried out to validate the design.
https://doi.org/10.12989/sss.2007.3.1.069 인용

FPGA-Based Low-Power and Low-Cost Portable Beamformer Design (FPGA 기반 저전력 및 저비용 휴대용 빔포머 설계)

Jeong, GabJoong;Park, CheolYoung
- Journal of Korea Society of Industrial Information Systems
- /
- v.24 no.1
- /
- pp.31-38
- /
- 2019
In this paper, we develop a beamforming front end platform with pipeline circuit configuration method that can apply various clinical diagnostic applications of ultrasound image technology. Hardware design targets compression applications as well as scalable applications where power, integration levels and replication possibilities are important. Firmware design was implemented to achieve optimal FPGA parallel processing level by constructing new IP and system-oriented design environment to accelerate design productivity with maximum productivity improvement using Vivado HLS tool, which is a next generation high level synthesis tool. Former supports the high-speed management function of scan data that can create an image area arbitrarily and can be appropriately corrected and supplemented when reconfiguring or changing system specifications in the future.
https://doi.org/10.9723/jksiis.2019.24.1.031 인용 PDF KSCI HTML

Hardware and Software Co-Design Platform for Energy-Efficient FPGA Accelerator Design (에너지 효율적인 FPGA 가속기 설계를 위한 하드웨어 및 소프트웨어 공동 설계 플랫폼)

Lee, Dongkyu;Park, Daejin
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.25 no.1
- /
- pp.20-26
- /
- 2021
Recent systems contain hardware and software components together for faster execution speed and less power consumption. In conventional hardware and software co-design, the ratio of software and hardware was divided by the designer's empirical knowledge. To find optimal results, designers iteratively reconfigure accelerators and applications and simulate it. Simulating iteratively while making design change is time-consuming. In this paper, we propose a hardware and software co-design platform for energy-efficient FPGA accelerator design. The proposed platform makes it easy for designers to find an appropriate hardware ratio by automatically generating application program code and hardware code by parameterizing the components of the accelerator. The co-design platform based on the Vitis unified software platform runs on a server with Xilinx Alveo U200 FPGA card. As a result of optimizing the multiplication accelerator for two matrices with 1000 rows, execution time was reduced by 90.7% and power consumption was reduced by 56.3%.
https://doi.org/10.6109/jkiice.2021.25.1.20 인용 PDF KSCI

A Hardware Allocation Algorithm for Optimal MUX-based FPGA Design (최적의 MUX-based FPGA 설계를 위한 하드웨어 할당 알고리듬)

인치호
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.26 no.7B
- /
- pp.996-1005
- /
- 2001
본 논문에서는 ASIC 벤더의 셀 라이브러리와 MUX-based FPGA에 있는 고정된 입력을 갖는 연결구조의 수를 최소화하는 하드웨어 할당 알고리듬을 제안한다. 제안된 할당 알고리듬은 연산자간을 연결하는 신호선이 반복적으로 이용되어 연결 신호선 수가 최소가 될 수 있도록 연산자를 할당한다. 연결 구조를 고려한 이분할 그래프에 가중치를 설정하고 변수와 레지스터간의 최대 가중치 매칭을 구함으로써 레지스터 할당을 수행한다. 또한 연결구조에 대한 멀티플렉서의 중복 입력을 제거하고 연산자에 연결된 멀티플렉서간의 입력을 교환하는 입력 정렬 과정으로 연결구조를 최소화한다. 벤치마크 실험을 통하여 제안된 알고리즘의 효용성을 보인다.
PDF

Design and Implementation of 256-Point Radix-4 100 Gbit/s FFT Algorithm into FPGA for High-Speed Applications

Polat, Gokhan;Ozturk, Sitki;Yakut, Mehmet
- ETRI Journal
- /
- v.37 no.4
- /
- pp.667-676
- /
- 2015
The third-party FFT IP cores available in today's markets do not provide the desired speed demands for optical communication. This study deals with the design and implementation of a 256-point Radix-4 100 Gbit/s FFT, where computational steps are reconsidered and optimized for high-speed applications, such as radar and fiber optics. Alternative methods for FFT implementation are investigated and Radix-4 is decided to be the optimal solution for our fully parallel FPGA application. The algorithms that we will implement during the development phase are to be tested on a Xilinx Virtex-6 FPGA platform. The proposed FFT core has a fully parallel architecture with a latency of nine clocks, and the target clock rate is 312.5 MHz.
https://doi.org/10.4218/etrij.15.0114.0678 인용 PDF KSCI

A Study on the Minimization of Layout Area for FPGA

Yi, Cheon-Hee
- Journal of the Semiconductor & Display Technology
- /
- v.9 no.2
- /
- pp.15-20
- /
- 2010
This paper deals with minimizing layout area of FPGA design. FPGAs are becoming increasingly important in the design of ASICs since they provide both large scale integration and user-programmability. This paper describes a method to obtain tight bound on the worst-case increase in area when drivers are introduced along many long wires in a layout. The area occupied by minimum-area embedding for a circuit can depend on the aspect ratio of the bounding rectangle of the layout. This paper presents a separator-based area-optimal embeddings for FPGA graphs in rectangles of several aspect ratios which solves the longest path problem in the constraint graph.
PDF KSCI

Design and Implementation of a GNSS Receiver Development Platform for Multi-band Signal Processing (다중대역 통합 신호처리 가능한 GNSS 수신기 개발 플랫폼 설계 및 구현)

Jinseok Kim;Sunyong Lee;Byeong Gyun Kim;Hung Seok Seo;Jongsun Ahn
- Journal of Positioning, Navigation, and Timing
- /
- v.13 no.2
- /
- pp.149-158
- /
- 2024
Global Navigation Satellite System (GNSS) receivers are becoming increasingly sophisticated, equipped with advanced features and precise specifications, thus demanding efficient and high-performance hardware platforms. This paper presents the design and implementation of a Field-Programmable Gate Array (FPGA)-based GNSS receiver development platform for multi-band signal processing. This platform utilizes a FPGA to provide a flexible and re-configurable hardware environment, enabling real-time signal processing, position determination, and handling of large-scale data. Integrated signal processing of L/S bands enhances the performance and functionality of GNSS receivers. Key components such as the RF frontend, signal processing modules, and power management are designed to ensure optimal signal reception and processing, supporting multiple GNSS. The developed hardware platform enables real-time signal processing and position determination, supporting multiple GNSS systems, thereby contributing to the advancement of GNSS development and research.
https://doi.org/10.11003/JPNT.2024.13.2.149 인용 PDF HTML

Deep Learning-based Real-Time Super-Resolution Architecture Design (경량화된 딥러닝 구조를 이용한 실시간 초고해상도 영상 생성 기술)

Ahn, Saehyun;Kang, Suk-Ju
- Journal of Broadcast Engineering
- /
- v.26 no.2
- /
- pp.167-174
- /
- 2021
Recently, deep learning technology is widely used in various computer vision applications, such as object recognition, classification, and image generation. In particular, the deep learning-based super-resolution has been gaining significant performance improvement. Fast super-resolution convolutional neural network (FSRCNN) is a well-known model as a deep learning-based super-resolution algorithm that output image is generated by a deconvolutional layer. In this paper, we propose an FPGA-based convolutional neural networks accelerator that considers parallel computing efficiency. In addition, the proposed method proposes Optimal-FSRCNN, which is modified the structure of FSRCNN. The number of multipliers is compressed by 3.47 times compared to FSRCNN. Moreover, PSNR has similar performance to FSRCNN. We developed a real-time image processing technology that implements on FPGA.
https://doi.org/10.5909/JBE.2021.26.2.167 인용 PDF KSCI KPUBS

Deep Learning-based Real-Time Super-Resolution Architecture Design (경량화된 딥러닝 구조를 이용한 실시간 초고해상도 영상 생성 기술)

Ahn, Saehyun;Kang, Suk-Ju
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2020.11a
- /
- pp.228-229
- /
- 2020
최근 딥러닝 기술은 여러 컴퓨터 비전 응용 분야에서 많이 쓰이고 있다. 물체 인식, 분류 및 영상 생성 등을 예로 들 수 있다. 특히 초고해상도 변환 문제에서 최근 딥러닝을 사용하면서 큰 성능 개선을 얻고 있다. Fast super-resolution convolutional neural network (FSRCNN)은 딥러닝 기반 초고해상도 알고리즘으로 잘 알려져 있으며, 여러 개의 convolutional layer로 추출한 저 해상도의 입력 특징을 활용하여 deconvolutional layer에서 초고해상도의 영상을 출력하는 알고리즘이다. 본 논문에서는 병렬 연산 효율성을 고려한 FPGA 기반 convolutional neural networks 가속기를 제안한다. 특히 deconvolutional layer를 convolutional layer로 변환하는 방법을 통해서 에너지 효율적인 가속기를 설계했다. 또한 제안한 방법은 FPGA 리소스를 고려하여 FSRCNN의 구조를 변형한 Optimal-FSRCNN을 제안한다. 사용하는 곱셈기의 개수를 FSRCNN 대비 2.4 배 압축하였고, 초고해상도 변환 성능을 평가하는 지표인 PSNR은 FSRCNN과 비슷한 성능을 내고 있다. 이를 통해서 FPGA 에 최적화된 네트워크를 구현하여 FHD 입력 영상을 UHD 영상으로 출력하는 실시간 영상처리 기술을 개발했다.
PDF

Search Result 25, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)