• Title/Summary/Keyword: pipelined memory

Search Result 79, Processing Time 0.024 seconds

Design and Verification of Pipelined Face Detection Hardware (파이프라인 구조의 얼굴 검출 하드웨어 설계 및 검증)

  • Kim, Shin-Ho;Jeong, Yong-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.10
    • /
    • pp.1247-1256
    • /
    • 2012
  • There are many filter based image processing algorithms and they usually require a huge amount of computations and memory accesses making it hard to attain a real-time performance, expecially in embedded applications. In this paper, we propose a pipelined hardware structure of the filter based face detection algorithm to show that the real time performance can be achieved by hardware design. In our design, the whole computation is divided into three pipeline stages: resizing the image (Resize), Transforming the image (ICT), and finding candidate area (Find Candidate). Each stage is optimized by considering the parallelism of the computation to reduce the number of cycles and utilizing the line memory to minimize the memory accesses. The resulting hardware uses 507 KB internal SRAM and occupies 9,039 LUTs when synthesized and configured on Xilinx Virtex5LX330 FPGA. It can operate at maximum 165MHz clock, giving the performance of 108 frame/sec, while detecting up to 20 faces.

A General Purpose DSP Architecture Using Instruction FIFO Memory (Instruction FIFO Memory를 이용한 범용 DSP 구조)

  • 박주현;김영민
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.3
    • /
    • pp.31-37
    • /
    • 1995
  • In this paper, we propose a programmable 16 bit DSP architecture using FIFO instruction memory. With this DSP architecture, System structure, BUS structure, instruction set ant and an assembler for system test are developed. The characteristic of this structure is that it simply fetches instructions not from RAM but from FIFO using shift operations. Accordingly, System can be designed regardless of RAM access time. One cycle is enough to execute an instruction, if instruction pipeline is operated. Another merit of this structure is that we can obtain the same effect as instruction pipelining without constructing a complex pipelined controller by decreasing the pipeline number.

  • PDF

Error correction codes to manage multiple bit upset in on-chip memories (온칩 메모리 내 다중 비트 이상에 대처하기 위한 오류 정정 부호)

  • Jun, Hoyoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1747-1750
    • /
    • 2022
  • As shrinking the semiconductor process into the deep sub-micron to achieve high-density, low power and high performance integrated circuits, MBU (multiple bit upset) by soft errors is one of the major challenge of on-chip memory systems. To address the MBU, single error correction, double error detection and double adjacent error correction (SEC-DED-DAEC) codes have been recently proposed. But these codes do not resolve mis-correction. We propose the SEC-DED-DAEC-TAED(triple adjacent error detection) code without mis-corrections. The generated H-matrix by the proposed heuristic algorithm to accomplish the proposed code is implemented as hardware and verified. The results show that there is no mis-correction in the proposed codes and the 2-stage pipelined decoder can be employed on-chip memory system.

A Simulator for a Five-stage Pipeline DSP core (5단계 파이프라인 DSP 코어를 위한 시뮬레이터의 설계)

  • 김문경;정우경
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.1161-1164
    • /
    • 1998
  • We designed a DSP core simulator with C language, that is able to simulate 5-stage pipelined DSP core, named YS-DSP. It can emulate all 5 stage pipelines in the DSP core. It can also emulate memory access, exception processing, and DSP parallel processing. Each pipeline stage is implemented by combination of one or more functions to process parts of each stage. After modeling and validating the simulator, we can use it to verify and to complement the DSP core HDL model and to enhance its performance.

  • PDF

Implementation of DCT using Bit Slice Signal Processor (BIT SLICE SIGNAL PROCESSOR를 이용한 DCT의 구현)

  • Kim, Dong-L.;Go, Seok-B.;Paek, Seung-K.;Lee, Tae-S.;Min, Byong-G.
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1449-1453
    • /
    • 1987
  • A microprogrammable Bit Slice Sinal Processor for image processing is implemented. Processing speed is increased by the parallelism in horizontal microprogram using 120bits microcode, pipelined architecture, 2 bank memory switching that interfaces with the Host through DMA, a variable clock control, overflow checking H/W,look-up table method and cache memory. With this processor, a DCT algorithm which uses 2-D FFT is performed. The execution time for $512{\times}512{\times}8$ image is 12 sec when 16 bit operation is runned, and the recovered image has acceptable quality with MSE 0.276%.

  • PDF

Performance Optimization of Parallel Algorithms

  • Hudik, Martin;Hodon, Michal
    • Journal of Communications and Networks
    • /
    • v.16 no.4
    • /
    • pp.436-446
    • /
    • 2014
  • The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). Algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributed-memory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.

A 8192-point pipelined FFT/IFFT processor using two-step convergent block floating-point scaling technique (2단계 수렴 블록 부동점 스케일링 기법을 이용한 8192점 파이프라인 FFT/IFFT 프로세서)

  • 이승기;양대성;신경욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.10C
    • /
    • pp.963-972
    • /
    • 2002
  • An 8192-point pipelined FFT/IFFT processor core is designed, which can be used in multi-carrier modulation systems such as DUf-based VDSL modem and OFDM-based DVB system. In order to improve the signal-to-quantization-noise ratio (SQNR) of FFT/IFFT results, two-step convergent block floating-point (TS_CBFP) scaling is employed. Since the proposed TS_CBFP scaling does not require additional buffer memory, it reduces memory as much as about 80% when compared with conventional CBFP methods, resulting in area-and power-efficient implementation. The SQNR of about 60-㏈ is achieved with 10-bit input, 14-bit internal data and twiddle factors, and 16-bit output. The core synthesized using 0.25-$\mu\textrm{m}$ CMOS library has about 76,300 gates, 390K bits RAM, and twiddle factor ROM of 39K bits. Simulation results show that it can safely operate up to 50-㎒ clock frequency at 2.5-V supply, resulting that a 8192-point FFT/IFFT can be computed every 164-${\mu}\textrm{s}$. It was verified by Xilinx FPGA implementation.

A 12b 10MS/s CMOS Pipelined ADC Using a Reference Scaling Technique (기준 전압 스케일링을 이용한 12비트 10MS/s CMOS 파이프라인 ADC)

  • Ahn, Gil-Cho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.11
    • /
    • pp.16-23
    • /
    • 2009
  • A 12b 10MS/s pipelined ADC with low DC gain amplifiers is presented. The pipelined ADC using a reference scaling technique is proposed to compensate the gain error in MDACs due to a low DC gain amplifier. To minimize the performance degradation of the ADC due to amplifier offset, the proposed offset trimming circuit is employed m the first-stage MDAC amplifier. Additional reset switches are used in all MDACs to reduce the memory effect caused by the low DC gain amplifier. The measured differential and integral non-linearities of the prototype ADC with 45dB DC gain amplifiers are less than 0.7LSB and 3.1LSB, respectively. The prototype ADC is fabricated in a $0.35{\mu}m$ CMOS process and achieves 62dB SNDR and 72dB SFDR with 2.4V supply and 10MHz sampling frequency while consuming 19mW power.

FPGA Implementation and Verification of A Pipelined 32-bit ARM Processor (파이프라인 방식의 32 비트 ARM 프로세서에 대한 FPGA 구현 및 검증)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.5
    • /
    • pp.105-110
    • /
    • 2022
  • Domestically, we are capable of designing high-end memory semiconductors, but not in processors, resulting in unbalance. Using Vivado as a development enivronment and implementing the processor on a Xilinx FPGA reduces time and cost dramatically. In this paper, the popular language VHDL which is widely used in Europe, universities, and research centers around the world for the digital system design is used for designing a pipelined 32-bit ARM processor, implemented on FPGA and verified by Integrated Logic Analyzer. As a result, the ARM processor implemented on FPGA could execute ARM instructions successfully.

An Implementation of Real-time Image Warping Using FPGA (FPGA를 이용한 실시간 영상 워핑 구현)

  • Ryoo, Jung Rae;Lee, Eun Sang;Doh, Tae-Yong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.9 no.6
    • /
    • pp.335-344
    • /
    • 2014
  • As a kind of 2D spatial coordinate transform, image warping is a basic image processing technique utilized in various applications. Though image warping algorithm is composed of relatively simple operations such as memory accesses and computations of weighted average, real-time implementations on embedded vision systems suffer from limited computational power because the simple operations are iterated as many times as the number of pixels. This paper presents a real-time implementation of a look-up table(LUT)-based image warping using an FPGA. In order to ensure sufficient data transfer rate from memories storing mapping LUT and image data, appropriate memory devices are selected by analyzing memory access patterns in an LUT-based image warping using backward mapping. In addition, hardware structure of a parallel and pipelined architecture is proposed for fast computation of bilinear interpolation using fixed-point operations. Accuracy of the implemented hardware is verified using a synthesized test image, and an application to real-time lens distortion correction is exemplified.