• Title/Summary/Keyword: 파이프라인 구조

Search Result 474, Processing Time 0.028 seconds

Design of Stereo Image Match Processor for Real Time Stereo Matching (실시간 스테레오 정합을 위한 스테레오 영상 정합 프로세서 설계)

  • Kim, Yeon-Jae;Sim, Deok-Seon
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.37 no.2
    • /
    • pp.50-59
    • /
    • 2000
  • Stereo vision is a technique extracting depth information from stereo images, which are two images that view an object or a scene from different locations. The most important procedure in stereo vision, which is called stereo matching, is to find the same points in stereo images. It is difficult to match stereo images in real time because stereo matching requires heavy calculation. In this Paper we design a digital VLSI to Process stereo matching in real time, which we call stereo image match processor (SIMP). For implementation of real time stereo matching, sliding memory and minimum selection tree are presented. SIMP is designed with pipeline architecture and parallel processing. SIMP takes 64 gray level 64$\times$64 stereo images and yields 8 level 64 $\times$64 disparity map by 3 bit disparity and 12 bit address outputs. SIMP can process stereo images with process speed of 240 frames/sec.

  • PDF

Efficient Hardware Support: The Lock Mechanism without Retry (하드웨어 지원의 재시도 없는 잠금기법)

  • Kim Mee-Kyung;Hong Chul-Eui
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.9
    • /
    • pp.1582-1589
    • /
    • 2006
  • A lock mechanism is essential for synchronization on the multiprocessor systems. The conventional queuing lock has two bus traffics that are the initial and retry of the lock-read. %is paper proposes the new locking protocol, called WPV (Waiting Processor Variable) lock mechanism, which has only one lock-read bus traffic command. The WPV mechanism accesses the shared data in the initial lock-read phase that is held in the pipelined protocol until the shared data is transferred. The nv mechanism also uses the cache state lock mechanism to reduce the locking overhead and guarantees the FIFO lock operations in the multiple lock contentions. In this paper, we also derive the analytical model of WPV lock mechanism as well as conventional memory and cache queuing lock mechanisms. The simulation results on the WPV lock mechanism show that about 50% of access time is reduced comparing with the conventional queuing lock mechanism.

Implementation of H.264/AVC Deblocking Filter on 1-D CGRA (1-D CGRA에서의 H.264/AVC 디블록킹 필터 구현)

  • Song, Sehyun;Kim, Kichul
    • Journal of IKEEE
    • /
    • v.17 no.4
    • /
    • pp.418-427
    • /
    • 2013
  • In this paper, we propose a parallel deblocking filter algorithm for H.264/AVC video standard. The deblocking filter has different filter processes according to boundary strength (BS) and each filter process requires various conditional calculations. The order of filtering makes it difficult to parallelize deblocking filter calculations. The proposed deblocking filter algorithm is performed on PRAGRAM which is a 1-D coarse grained reconfigurable architecture (CGRA). Each filter calculation is accelerated using uni-directional pipelined architecture of PRAGRAM. The filter selection and the conditional calculations are efficiently performed using dynamic reconfiguration and conditional reconfiguration. The parallel deblocking filter algorithm uses 225 cycles to process a macroblock and it can process a full HD image at 150 MHz.

Study on Chip Design & Implementation of 32 Bit Floating Point Compatible DSP (32비트 부동소수점 호환 DSP의 설계 및 칩 구현에 관한 연구)

  • Woo, Jong-Sik;Seo, Jin-Keun;Lim, Jae-Young;Park, Ju-Sung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.11
    • /
    • pp.74-84
    • /
    • 2000
  • This paper deals with procedures for design and implementation of a DSP, which is compatible with TMS320C30 DSP. CBS(Cycle Based Simulator) is developed to study the architecture of the target DSP. The simulator gives us detailed information such as function block operation, control signal values, register condition, bus and memory values when a instruction is being carried out. RTL design is carried out by VHDL. Logic simulation and hardware emulation are employed to verify proper operation of the design. The DSP is fabricated with 0.6${\mu}m$ CMOS technology. The Chip has 450,000 gates complexity, $9{\times}9mm^2$ area, 20 MIPS operation speed. It is confirmed by running 109 instructions out of 114 instructions and 13 kinds of algorithm that the developed DSP has compatibility with TMS320C30.

  • PDF

Design of an Efficient Binary Arithmetic Encoder for H.264/AVC (H.264/AVC를 위한 효율적인 이진 산술 부호화기 설계)

  • Moon, Jeon-Hak;Kim, Yoon-Sup;Lee, Seong-Soo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.12
    • /
    • pp.66-72
    • /
    • 2009
  • This paper proposes an efficient binary arithmetic encoder for CABAC which is used one of the entropy coding methods for H.264/AVC. The present binary arithmetic encoding algorithm requires huge complexity of operation and data dependency of each step, which is difficult to be operated in fast. Therefore, renormalization exploits 2-stage pipeline architecture for efficient process of operation, which reduces huge complexity of operation and data dependency. Context model updater is implemented by using a simple expression instead of transIdxMPS table and merging transIdxLPS and rangeTabLPS tables, which decreases hardware size. Arithmetic calculator consists of regular mode, bypass mode and termination mode for appearance probability of binary value. It can operate in maximum speed. The proposed binary arithmetic encoder has 7282 gate counts in 0.18um standard cell library. And input symbol per cycle is about 1.

A Study on Cycle Based Simulator of a 32 bit floating point DSP (32비트 부동소수점 DSP의 Cycle Based Simulator에 관한 연구)

  • 우종식;양해용;안철홍;박주성
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.11
    • /
    • pp.31-38
    • /
    • 1998
  • This paper deals with CBS(Cycle Base Simulator) design of a 32 bit floating point DSP(Digital Signal Processor). The CBS has been developed for TMS320C30 compatible DSP and will be used to confirm the architecture, functions of sub-blocks, and control signals of the chip before the detailed logic design starts with VHDL. The outputs from CBS are used as important references at gate level design step because they give us control signals, output values of important blocks, values from internal buses and registers at each pipeline step, which are not available from the commercial simulator of DSP. In addition to core functions, it has various interfaces for efficient execution and convenient result display, CBS is verified through comparison with results from the commercial simulator for many application algorithms and its simulation speed is as fast as several tenth of that of logic simulation with VHDL. CBS in this work is for a specific DSP, but the concept may be applicable to other VLSI design.

  • PDF

An Implementation of the $5\times5$ CNN Hardware and the Pre.Post Processor ($5\times5$ CNN 하드웨어 및 전.후 처리기 구현)

  • Kim Seung-Soo;Jeon Heung-Woo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.5
    • /
    • pp.865-870
    • /
    • 2006
  • The cellular neural networks have shown a vast computing power for the image processing in spite of the simplicity of its structure. However, it is impossible to implement the CNN hardware which would require the same enormous amount of cells as that of the pixels involved in the practical large image. In this parer, the $5\times5$ CNN hardware and the pre post processor which can be used for processing the real large image with a time-multiplexing scheme are implemented. The implemented $5\times5$ CNN hardware and pre post processor is applied to the edge detection of $256\times256$ lena image to evaluate the performance. The total number of block. By the time-multiplexing process is about 4,000 blocks and to control pulses are needed to perform the pipelined operation or the each block. By the experimental resorts, the implemented $5\times5$ CNN hardware and pre post processor can be used to the real large image processing.

Design of Hash Processor for SHA-1, HAS-160, and Pseudo-Random Number Generator (SHA-1과 HAS-160과 의사 난수 발생기를 구현한 해쉬 프로세서 설계)

  • Jeon, Shin-Woo;Kim, Nam-Young;Jeong, Yong-Jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.1C
    • /
    • pp.112-121
    • /
    • 2002
  • In this paper, we present a design of a hash processor for data security systems. Two standard hash algorithms, Sha-1(American) and HAS-1600(Korean), are implemented on a single hash engine to support real time processing of the algorithms. The hash processor can also be used as a PRNG(Pseudo-random number generator) by utilizing SHA-1 hash iterations, which is being used in the Intel software library. Because both SHA-1 and HAS-160 have the same step operation, we could reduce hardware complexity by sharing the computation unit. Due to precomputation of message variables and two-stage pipelined structure, the critical path of the processor was shortened and overall performance was increased. We estimate performance of the hash processor about 624 Mbps for SHA-1 and HAS-160, and 195 Mbps for pseudo-random number generation, both at 100 MHz clock, based on Samsung 0.5um CMOS standard cell library. To our knowledge, this gives the best performance for processing the hash algorithms.

Design and Implementation of Human-Detecting Radar System for Indoor Security Applications (실내 보안 응용을 위한 사람 감지 레이다 시스템의 설계 및 구현)

  • Jang, Daeho;Kim, Hyeon;Jung, Yunho
    • Journal of IKEEE
    • /
    • v.24 no.3
    • /
    • pp.783-790
    • /
    • 2020
  • In this paper, the human detecting radar system for indoor security applications is proposed, and its FPGA-based implementation results are presented. In order to minimize the complexity and memory requirements of the computation, the top half of the spectrogram was used to extract features, excluding the feature extraction techniques that require complex computation, feature extraction techniques were proposed considering classification performance and complexity. In addition, memory requirements were minimized by designing a pipeline structure without storing the entire spectrogram. Experiments on human, dog and robot cleaners were conducted for classification, and 96.2% accuracy performance was confirmed. The proposed system was implemented using Verilog-HDL, and we confirmed that a low-area design using 1140 logics and 6.5 Kb of memory was possible.

An Efficient Test Method for a Full-Custom Design of a High-Speed Binary Multiplier (풀커스텀 (full-custom) 고속 곱셈기 회로의 효율적인 테스트 방안)

  • Moon, San-Gook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.830-833
    • /
    • 2007
  • In this paper, we implemented a $17{\times}17b$ binary digital multiplier using radix-4 Booth;s algorithmand proposed an efficient testing methodology for the full-custom design. A two-stage pipeline architecture was applied to achieve higher throughput and 4:2 adders were used for regular layout structure in the Wallace tree partition. Several chips were fabricated using LG Semicon 0.6-um 3-Metal N-well CMOS technology. We did fault simulations efficiently using the proposed test method resulting in the reduction of the number of faulty nodes by 88%. The chip contains 9115 transistors and the core area occupies $1135^*1545$ mm2. The functional tests using ATS-2 tester showed that it can operate with 24 MHz clock at 5.0 V at room temperature.

  • PDF