• Title/Summary/Keyword: pipelined architecture

Search Result 176, Processing Time 0.025 seconds

Optimized Hardware Design of Deblocking Filter for H.264/AVC (H.264/AVC를 위한 디블록킹 필터의 최적화된 하드웨어 설계)

  • Jung, Youn-Jin;Ryoo, Kwang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.1
    • /
    • pp.20-27
    • /
    • 2010
  • This paper describes a design of 5-stage pipelined de-blocking filter with power reduction scheme and proposes a efficient memory architecture and filter order for high performance H.264/AVC Decoder. Generally the de-blocking filter removes block boundary artifacts and enhances image quality. Nevertheless filter has a few disadvantage that it requires a number of memory access and iterated operations because of filter operation for 4 time to one edge. So this paper proposes a optimized filter ordering and efficient hardware architecture for the reduction of memory access and total filter cycles. In proposed filter parallel processing is available because of structured 5-stage pipeline consisted of memory read, threshold decider, pre-calculation, filter operation and write back. Also it can reduce power consumption because it uses a clock gating scheme which disable unnecessary clock switching. Besides total number of filtering cycle is decreased by new filter order. The proposed filter is designed with Verilog-HDL and functionally verified with the whole H.264/AVC decoder using the Modelsim 6.2g simulator. Input vectors are QCIF images generated by JM9.4 standard encoder software. As a result of experiment, it shows that the filter can make about 20% total filter cycles reduction and it requires small transposition buffer size.

Design of an Adaptive Reed-Solomon Decoder with Varying Block Length (가변 블록길이를 갖는 적응형 리드솔로몬 복호기의 설계)

  • Song, Moon-Kyou;Kong, Min-Han
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.4C
    • /
    • pp.365-373
    • /
    • 2003
  • In this paper, we design a versatle RS decoder which can decode RS codes of any block length n as well as any message length k, based on a modified Euclid's algorithm (MEA). This unique feature is favorable for a shortened RS code of any block length it eliminates the need to insert zeros before decoding a shortened RS code. Furthermore, the value of error correcting capability t can be changed in real time at every codeword block. Thus, when a return channel is available, the error correcting capability can be adaptiverly altered according to channel state. The decoder permits 4-step pipelined processing : (1) syndrome calculation (2) MEA block (3) error magnitude calculation (4) decoder failure check. Each step is designed to form a structure suitable for decoding a RS code with varying block length. A new architecture is proposed for a MEA block in step (2) and an architecture of outputting in reversed order is employed for a polynomial evaluation in step (3). To maintain to throughput rate with less circuitry, the MEA block uses not only a multiplexing and recursive technique but also an overclocking technique. The adaptive RS decoder over GF($2^8$) with the maximal error correcting capability of 10 has been designed in VHDL, and successfully synthesized in a FPGA.

An 8b 240 MS/s 1.36 ㎟ 104 mW 0.18 um CMOS ADC for High-Performance Display Applications (고성능 디스플레이 응용을 위한 8b 240 MS/s 1.36 ㎟ 104 mW 0.18 um CMOS ADC)

  • In Kyung-Hoon;Kim Se-Won;Cho Young-Jae;Moon Kyoung-Jun;Jee Yong;Lee Seung-Hoon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.1
    • /
    • pp.47-55
    • /
    • 2005
  • This work describes an 8b 240 MS/s CMOS ADC as one of embedded core cells for high-performance displays requiring low power and small size at high speed. The proposed ADC uses externally connected pins only for analog inputs, digital outputs, and supplies. The ADC employs (1) a two-step pipelined architecture to optimize power and chip size at the target sampling frequency of 240 MHz, (2) advanced bootstrapping techniques to achieve high signal bandwidth in the input SHA, and (3) RC filter-based on-chip I/V references to improve noise performance with a power-off function added for portable applications. The prototype ADC is implemented in a 0.18 um CMOS and simultaneously integrated in a DVD system with dual-mode inputs. The measured DNL and INL are within 0.49 LSB and 0.69 LSB, respectively. The prototype ADC shows the SFDR of 53 dB for a 10 MHz input sinewave at 240 MS/s while maintaining the SNDR exceeding 38 dB and the SFDR exceeding 50 dB for input frequencies up to the Nyquist frequency at 240 MS/s. The ADC consumes, 104 mW at 240 MS/s and the active die area is 1.36 ㎟.

Performance Analysis of Implementation on Image Processing Algorithm for Multi-Access Memory System Including 16 Processing Elements (16개의 처리기를 가진 다중접근기억장치를 위한 영상처리 알고리즘의 구현에 대한 성능평가)

  • Lee, You-Jin;Kim, Jea-Hee;Park, Jong-Won
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.3
    • /
    • pp.8-14
    • /
    • 2012
  • Improving the speed of image processing is in great demand according to spread of high quality visual media or massive image applications such as 3D TV or movies, AR(Augmented reality). SIMD computer attached to a host computer can accelerate various image processing and massive data operations. MAMS is a multi-access memory system which is, along with multiple processing elements(PEs), adequate for establishing a high performance pipelined SIMD machine. MAMS supports simultaneous access to pq data elements within a horizontal, a vertical, or a block subarray with a constant interval in an arbitrary position in an $M{\times}N$ array of data elements, where the number of memory modules(MMs), m, is a prime number greater than pq. MAMS-PP4 is the first realization of the MAMS architecture, which consists of four PEs in a single chip and five MMs. This paper presents implementation of image processing algorithms and performance analysis for MAMS-PP16 which consists of 16 PEs with 17 MMs in an extension or the prior work, MAMS-PP4. The newly designed MAMS-PP16 has a 64 bit instruction format and application specific instruction set. The author develops a simulator of the MAMS-PP16 system, which implemented algorithms can be executed on. Performance analysis has done with this simulator executing implemented algorithms of processing images. The result of performance analysis verifies consistent response of MAMS-PP16 through the pyramid operation in image processing algorithms comparing with a Pentium-based serial processor. Executing the pyramid operation in MAMS-PP16 results in consistent response of processing time while randomly response time in a serial processor.

A3V 10b 33 MHz Low Power CMOS A/D Converter for HDTV Applications (HDTV 응용을 위한 3V 10b 33MHz 저전력 CMOS A/D 변환기)

  • Lee, Kang-Jin;Lee, Seung-Hoon
    • Journal of IKEEE
    • /
    • v.2 no.2 s.3
    • /
    • pp.278-284
    • /
    • 1998
  • This paper describes a l0b CMOS A/D converter (ADC) for HDTV applications. The proposed ADC adopts a typical multi-step pipelined architecture. The proposed circuit design techniques are as fo1lows: A selective channel-length adjustment technique for a bias circuit minimizes the mismatch of the bias current due to the short channel effect by supply voltage variations. A power reduction technique for a high-speed two-stage operational amplifier decreases the power consumption of amplifiers with wide bandwidths by turning on and off bias currents in the suggested sequence. A typical capacitor scaling technique optimizes the chip area and power dissipation of the ADC. The proposed ADC is designed and fabricated in s 0.8 um double-poly double-metal n-well CMOS technology. The measured differential and integral nonlinearities of the prototype ADC show less than ${\pm}0.6LSB\;and\;{\pm}2.0LSB$, respectively. The typical ADC power consumption is 119 mW at 3 V with a 40 MHz sampling rate, and 320 mW at 5 V with a 50 MHz sampling rate.

  • PDF

A New Hardware Design for Generating Digital Holographic Video based on Natural Scene (실사기반 디지털 홀로그래픽 비디오의 실시간 생성을 위한 하드웨어의 설계)

  • Lee, Yoon-Hyuk;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.11
    • /
    • pp.86-94
    • /
    • 2012
  • In this paper we propose a hardware architecture of high-speed CGH (computer generated hologram) generation processor, which particularly reduces the number of memory access times to avoid the bottle-neck in the memory access operation. For this, we use three main schemes. The first is pixel-by-pixel calculation rather than light source-by-source calculation. The second is parallel calculation scheme extracted by modifying the previous recursive calculation scheme. The last one is a fully pipelined calculation scheme and exactly structured timing scheduling by adjusting the hardware. The proposed hardware is structured to calculate a row of a CGH in parallel and each hologram pixel in a row is calculated independently. It consists of input interface, initial parameter calculator, hologram pixel calculators, line buffer, and memory controller. The implemented hardware to calculate a row of a $1,920{\times}1,080$ CGH in parallel uses 168,960 LUTs, 153,944 registers, and 19,212 DSP blocks in an Altera FPGA environment. It can stably operate at 198MHz. Because of the three schemes, the time to access the external memory is reduced to about 1/20,000 of the previous ones at the same calculation speed.