Loading [MathJax]/jax/output/CommonHTML/jax.js
  • Title/Summary/Keyword: Software Pipelining

Search Result 23, Processing Time 0.017 seconds

FPGA-based Implementation of Fast Histogram Equalization for Image Enhancement (영상 품질 개선을 위한 FPGA 기반 고속 히스토그램 평활화 회로 구현)

  • Ryu, Sang-Moon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.11
    • /
    • pp.1377-1383
    • /
    • 2019
  • Histogram equalization is the most frequently used algorithm for image enhancement. Its hardware implementation significantly outperforms in time its software version. The overall performance of FPGA-based implementation of histogram equalization can be improved by applying pipelining in the design and by exploiting the multipliers and a lot of SRAM blocks which are embedded in recent FPGAs. This work proposes how to implement a fast histogram equalization circuit for 8-bit gray level images. The proposed design contains a FIFO to perform equalization on an image while the histogram for next image is being calculated. Because of some overlap in time for histogram equalization, embedded multipliers and pipelined design, the proposed design can perform histogram equalization on a pixel nearly at a clock. And its dual parallel version outperforms in time almost two times over the original one.

LASPI: Hardware friendly LArge-scale stereo matching using Support Point Interpolation (LASPI: 지원점 보간법을 이용한 H/W 구현에 용이한 스테레오 매칭 방법)

  • Park, Sanghyun;Ghimire, Deepak;Kim, Jung-guk;Han, Youngki
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.932-945
    • /
    • 2017
  • In this paper, a new hardware and software architecture for a stereo vision processing system including rectification, disparity estimation, and visualization was developed. The developed method, named LArge scale stereo matching method using Support Point Interpolation (LASPI), shows excellence in real-time processing for obtaining dense disparity maps from high quality image regions that contain high density support points. In the real-time processing of high definition (HD) images, LASPI does not degrade the quality level of disparity maps compared to existing stereo-matching methods such as Efficient LArge-scale Stereo matching (ELAS). LASPI has been designed to meet a high frame-rate, accurate distance resolution performance, and a low resource usage even in a limited resource environment. These characteristics enable LASPI to be deployed to safety-critical applications such as an obstacle recognition system and distance detection system for autonomous vehicles. A Field Programmable Gate Array (FPGA) for the LASPI algorithm has been implemented in order to support parallel processing and 4-stage pipelining. From various experiments, it was verified that the developed FPGA system (Xilinx Virtex-7 FPGA, 148.5MHz Clock) is capable of processing 30 HD (1280×720pixels) frames per second in real-time while it generates disparity maps that are applicable to real vehicles.

Optimal Scheduling of SAD Algorithm on VLIW-Based High Performance DSP (VLIW 기반 고성능 DSP에서의 SAD 알고리즘 최적화 스케줄링)

  • Yu, Hui-Jae;Jung, Sou-Hwan;Chung, Sun-Tae
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.12
    • /
    • pp.262-272
    • /
    • 2007
  • SAD (Sum of Absolute Difference) algorithm is the most frequently executing routine in motion estimation, which is the most demanding process in motion picture encoding. To enhance the performance of motion picture encoding on a VLIW processor, an optimal implementation of SAD algorithm on VLIW processor should be accomplished. In this paper, we propose an implementation of optimal scheduling of SAD algorithm with conditional branch on a VLIW-based high performance DSP. We first transform the nested loop with conditional branch of SAD algorithm into a single loop with conditional branch which has a large enough loop body to utilize fully the ILP capability of VLIW DSP and has a conditional branch to make the escape from loop to be achieved as soon as possible. And then we apply a modulo scheduling technique to the transformed single loop. We test the proposed implementation on TMS320C6713, and analyze the code size and performance with respect to processing time. Through experiments, it is shown that the SAD implementation proposed in this paper has small code size appropriate for embedded applications, and the H.263 encoder with the proposed SAD implementation performs better than other H.263 encoder with other SAD implementations.