• Title/Summary/Keyword: pipelined architecture

Search Result 176, Processing Time 0.023 seconds

Hardware Design of Bilateral Filter Based on Window Division (윈도우 분할 기반 양방향 필터의 하드웨어 설계)

  • Hyun, Yongho;Park, Taegeun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.12
    • /
    • pp.1844-1850
    • /
    • 2016
  • The bilateral filter can reduce the noise while preserving details computing the filtering output at each pixels as the average of neighboring pixels. In this paper, we propose a real-time system based on window division. Overall performance is increased due to the parallel architectures which computes five rows in the kernel window simultaneously but with pipelined scheduling. We consider the tradeoff between the filter performance and the hardware cost and the bit allocation has been determined by PSNR analysis. The proposed architecture is designed with verilogHDL and synthesized using Dongbu Hitek 110nm standard cell library. The proposed architecture shows 416Mpixels/s (397fps) of throughput at 416MHz of operating frequency with 132K gates.

VLSI Design of H.263 Video Codec Based on Modular Architecture (모듈화된 구조에 기반한 H.263 비디오 코덱 VLSI의 설계)

  • Kim, Myung-Jin;Lee, Sang-Hee;Kim, Keun-Bae
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.39 no.5
    • /
    • pp.477-485
    • /
    • 2002
  • In this paper, we present an efficient hardware architecture for the H.263 video codec and its VLSI implementation. This architecture is based on the unified interface by which internal hardware engines and an internal RISC processor are connected one another. The unified interface enables the modular design of internal blocks, efficient hardware/software partitioning, and pipelined paralled operations. The developed VLSI supports the H.263 version 2 profile 3 @ level 10, and moreover, both the control protocol H.245 and the multiplexing protocol H.223. Therefore, it can be used for the complete ITU-T H.324 or 3GPP 3G 324M multimedia processor with the help of an external audio codec. Simultaneous encoding and decoding of QCIF format images at a rate greater than 15 frames per second is achieved at 40 MHz clock frequency.

40Gb/s Foward Error Correction Architecture for Optical Communication System (광통신 시스템을 위한 40Gb/s Forward Error Correction 구조 설계)

  • Lee, Seung-Beom;Lee, Han-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.101-111
    • /
    • 2008
  • This paper introduces a high-speed Reed-Solomon(RS) decoder, which reduces the hardware complexity, and presents an RS decoder based FEC architecture which is used for 40Gb/s optical communication systems. We introduce new pipelined degree computationless modified Euclidean(pDCME) algorithm architecture, which has high throughput and low hardware complexity. The proposed 16 channel RS FEC architecture has two 8 channel RS FEC architectures, which has 8 syndrome computation block and shared single KES block. It can reduce the hardware complexity about 30% compared to the conventional 16 channel 3-parallel FEC architecture, which is 4 syndrome computation block and shared single KES block. The proposed RS FEC architecture has been designed and implemented with the $0.18-{\mu}m$ CMOS technology in a supply voltage of 1.8 V. The result show that total number of gate is 250K and it has a data processing rate of 5.1Gb/s at a clock frequency of 400MHz. The proposed area-efficient architecture can be readily applied to the next generation FEC devices for high-speed optical communications as well as wireless communications.

Hardware Design of High Performance HEVC Deblocking Filter for UHD Videos (UHD 영상을 위한 고성능 HEVC 디블록킹 필터 설계)

  • Park, Jaeha;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.1
    • /
    • pp.178-184
    • /
    • 2015
  • This paper proposes a hardware architecture for high performance Deblocking filter(DBF) in High Efficiency Video Coding for UHD(Ultra High Definition) videos. This proposed hardware architecture which has less processing time has a 4-stage pipelined architecture with two filters and parallel boundary strength module. Also, the proposed filter can be used in low-voltage design by using clock gating architecture in 4-stage pipeline. The segmented memory architecture solves the hazard issue that arises when single port SRAM is accessed. The proposed order of filtering shortens the delay time that arises when storing data into the single port SRAM at the pre-processing stage. The DBF hardware proposed in this paper was designed with Verilog HDL, and was implemented with 22k logic gates as a result of synthesis using TSMC 0.18um CMOS standard cell library. Furthermore, the dynamic frequency can process UHD 8k($7680{\times}4320$) samples@60fps using a frequency of 150MHz with an 8K resolution and maximum dynamic frequency is 285MHz. Result from analysis shows that the proposed DBF hardware architecture operation cycle for one process coding unit has improved by 32% over the previous one.

A Design of Low Power 16-bit ALU by Switched Capacitance Reduction (Switched Capacitance 감소를 통한 저전력 16비트 ALU 설계)

  • Ryu, Beom-Seon;Lee, Jung-Sok;Lee, Kie-Young;Cho, Tae-Won
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.1
    • /
    • pp.75-82
    • /
    • 2000
  • In this paper, a new low power 16-bit ALU has been designed, fabricated and tested at the transistor level. The designed ALU performs 16 instructions and has a two-stage pipelined architecture. For the reduction of switched capacitance, the ELM adder of the proposed ALU is inactive while the logical operation is performed and P(propagation) block has a dual bus architecture. A new efficient P and G(generation) blocks are also proposed for the above ALU architecture. ELM adder, double-edge triggered register and the combination of logic style are used for low power consumption as well. As a result of simulations, the proposed architecture shows better power efficient than conventional architecture$^{[1,2]}$ as the number of logic operation to be performed is increased over that of arithmetic to logic operation to be performed is 7 to 3, compared to conventional architecture. The proposed ALU was fabricated with 0.6${\mu}m$ single-poly triple-metal CMOS process. As a result of chip test, the maximum operating frequency is 53MHz and power consumption is 33mW at 50MHz, 3.3V.

  • PDF

An Implementation of Real-time Image Warping Using FPGA (FPGA를 이용한 실시간 영상 워핑 구현)

  • Ryoo, Jung Rae;Lee, Eun Sang;Doh, Tae-Yong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.9 no.6
    • /
    • pp.335-344
    • /
    • 2014
  • As a kind of 2D spatial coordinate transform, image warping is a basic image processing technique utilized in various applications. Though image warping algorithm is composed of relatively simple operations such as memory accesses and computations of weighted average, real-time implementations on embedded vision systems suffer from limited computational power because the simple operations are iterated as many times as the number of pixels. This paper presents a real-time implementation of a look-up table(LUT)-based image warping using an FPGA. In order to ensure sufficient data transfer rate from memories storing mapping LUT and image data, appropriate memory devices are selected by analyzing memory access patterns in an LUT-based image warping using backward mapping. In addition, hardware structure of a parallel and pipelined architecture is proposed for fast computation of bilinear interpolation using fixed-point operations. Accuracy of the implemented hardware is verified using a synthesized test image, and an application to real-time lens distortion correction is exemplified.

Subband Affine Projection Adaptive Filter using Variable Step Size and Pipeline Transform (가변 적응상수와 파이프라인 변환을 이용한 부밴드 인접투사 적응필터)

  • Choi, Hun;Ha, Hong-Gon;Bae, Hyeon-Deok
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.1
    • /
    • pp.104-110
    • /
    • 2009
  • In this paper, we suggest a new technique which employ the pipelined architecture for the implementation of the SAP adaptive filter using variable step size. According as SAP adaptive filter is sufficiently decomposed, a simplified SAP adaptive filter can be derived, and the weights of adaptive sub-filters can be updated by a simple formular without a matrix inversion. The convergence speed and the steady state error of the simplified SAP adaptive filter are improved by using variable step size. For practical implementation, the simplified SAP adaptive sub-filters are transformed by the pipeline technique.

A 10-bit 20-MHz CMOS A/D converter (10-bit 20-MHz CMOS A/D 변환기)

  • 최희철;안길초;이승훈;강근순;이성호;최명준
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.33A no.4
    • /
    • pp.152-161
    • /
    • 1996
  • In tis work, a three-stage pipelined A/D converter (ADC) was implemented to obtain 10-bit resolution at a conversion rate of 20 msamples/s for video applications. The ADC consists of three identical stages employing a mid-rise coding technique. The interstage errors such as offsets and clock feedthrough are digitally corrected in digitral logic by one overlapped bit between stages. The proposed ADC is optimized by adopting a unit-capacitor array architecture in the MDAC to improve the differential nonlinearity and the yield. Reduced power dissipation has been achieve dby using low-power latched comparators. The prototype was fabricated in a 0.8$\mu$m p-well CMOS technology. The ADC dissipates 160 mW at a 20 MHz clock rate with a 5 V single supply voltage and occupies a die area of 7 mm$^{2}$(2.7 mm $\times$ 2.6mm) including bonding pads and stand-alone internal bias circuit. The typical differential and integral nonlinarities of the prototype are less than $\pm$ 0.6 LSB and $\pm$ 1 LSB, respectively.

  • PDF

Design and Implementation of 3DES crypto-algorithm with Pipeline Architecture (파이프라인 구조의 3DES 암호알고리즘의 설계 및 구현)

  • Lee Wan-Bok;Kim Jung-Tae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.2
    • /
    • pp.333-337
    • /
    • 2006
  • Symmetric block ciper algorithm consists of a chains of operations such as permutation and substitution. There exists four kinds of operation mode, CBC, ECB, CFB, and OFB depending on the operation paradigm. Since the final ciper text is obtained through the many rounds of operations, it consumes much time. This paper proposes a pipelined design methodology which can improve the speed of crypto operations in ECB mode. Because the operations of the many rounds are concatenated in serial and executed concurrently, the overall computation time can be reduced significantly. The experimental result shows that the method can speed up the performance more than ten times.

A Study on the Hardware Implementation of A 3${\times}$3 Window Weighted Median Filter Using Bit-Level Sorting Algorithm (비트 레벨 정렬 알고리즘을 이용한 3${\times}$3 윈도우 가중 메디언 필터의 하드웨어 구현에 관한 연구)

  • 이태욱;조상복
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.3
    • /
    • pp.197-205
    • /
    • 2004
  • In this paper, we studied on the hardware implementation of a 3${\times}$3 window weighted median filter using bit-level sorting algorithm. The weighted median filter is a generalization of the median filter that is able to preserve :,harp changes in signal and is very effective in removing impulse noise. It has been successfully applied in various areas such as digital signal and video/image processing. The weighted median filters are, for the most part, based on word-level sorting methods, which have more hardware and time complexity, However, the proposed bit-serial sorting algorithm uses weighted adder tree to overcome those disadvantages. It also offers a simple pipelined filter architecture that is highly regular with repeated modules and is very suitable for weighted median filtering. The algorithm was implemented by VHDL and graphical environment in MAX+PlusII of ALTERA. The simulation results indicate that the proposed design method is more efficient than the traditional ones.