• 제목/요약/키워드: Standard cell library

Search Result 196, Processing Time 0.025 seconds

A High Speed FFT Processor for OFDM Systems (OFDM 시스템을 위한 고속 FFT 프로세서)

  • 조병각;손병수;선우명훈
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.39 no.12
    • /
    • pp.513-519
    • /
    • 2002
  • This paper proposes a high-speed FFT processor for orthogonal frequency-division multiplexing(OFDM) systems. The Proposed architecture uses a single-memory architecture and uses a radix-4 algorithm for high speed. The proposed memory is partitioned into four banks for high-speed computation. It uses an in-place memory strategy that stores butterfly outputs in the same memory location used by butterfly inputs. Therefore, the memory size can be reduced. The SQNR of about 80dB is achieved with 20-bit input and 20-bit twiddle factors. The architecture has been modeled by VHDL and logic synthesis has been performed using the SamsungTM 0.5㎛ SOG cell library (KG80). The implemented FFT processor consists of 98,326 gates excluding memory. It has smaller hardware than existing pipeline FFT processors for more than 1024-point FFTs. The processor can operate at 42MHz and calculate a 256-point complex FFT in 6us. It satisfies tile required processing speed of 8.4㎲ in the HomePlug standard.

An LNS-based Low-power/Small-area FFT Processor for OFDM Systems (OFDM 시스템용 로그 수체계 기반의 저전력/저면적 FFT 프로세서)

  • Park, Sang-Deok;Shin, Kyung-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.8
    • /
    • pp.53-60
    • /
    • 2009
  • A low-power/small-area 128-point FFT processor is designed, which is based on logarithmic number system (LNS) and some design techniques to minimize both hardware complexity and arithmetic error. The complex-number multiplications and additions/subtractions for FFT computation are implemented with LNS adders and look-up table (LUT) rather than using conventional two's complement multipliers and adders. Our design reduces the gate counts by 21% and the memory size by 16% when compared to the conventional two's complement implementation. Also, the estimated power consumption is reduced by about 18%. The LNS-based FFT processor synthesized with 0.35 ${\mu}m$ CMOS standard cell library has 39,910 gates and 2,880 bits memory. It can compute a 128-point FIT in 2.13 ${\mu}s$ with 60 MHz@2.5V, and has the average SQNR of 40.7 dB.

Multi-mode Layered LDPC Decoder for IEEE 802.11n (IEEE 802.11n용 다중모드 layered LDPC 복호기)

  • Na, Young-Heon;Shin, Kyung-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.11
    • /
    • pp.18-26
    • /
    • 2011
  • This paper describes a multi-mode LDPC decoder which supports three block lengths(648, 1296, 1944) and four code rates(1/2, 2/3, 3/4, 5/6) of IEEE 802.11n wireless LAN standard. To minimize hardware complexity, it adopts a block-serial (partially parallel) architecture based on the layered decoding scheme. A novel memory reduction technique devised using the min-sum decoding algorithm reduces the size of check-node memory by 47% as compared to conventional method. From fixed-point modeling and Matlab simulations for various bit-widths, decoding performance and optimal hardware parameters such as fixed-point bit-width are analyzed. The designed LDPC decoder is verified by FPGA implementation, and synthesized with a 0.18-${\mu}m$ CMOS cell library. It has 219,100 gates and 45,036 bits RAM, and the estimated throughput is about 164~212 Mbps at 50 MHz@2.5v.

Effective hardware design for DCT-based Intra prediction encoder (DCT 기반 인트라 예측 인코더를 위한 효율적인 하드웨어 설계)

  • Cha, Ki-Jong;Ryoo, Kwang-Ki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.4
    • /
    • pp.765-770
    • /
    • 2012
  • In this paper, we proposed an effective hardware structure using DCT-based inra-prediction mode selection to reduce computational complexity caused by intra mode decision. In this hardware structure, the input block is transformed at first and then analyzed to determine its texture directional tendency. the complexity has solved by performing intra prediction in only predicted edge direction. $4{\times}4$ DCT is calculated in one cycle using Multitransform_PE and Inta_pred_PE calculates one prediction mode in two cycles. Experimental results show that the proposed Intra prediction encoding needs only 517 cycles for one macroblock encoding. This architecture improves the performance by about 17% than previous designs. For hardware implementation, the proposed intra prediction encoder is implemented using Verilog HDL and synthesized with Megnachip $0.18{\mu}m$ standard cell library. The synthesis results show that the proposed architecture can run at 125MHz.

An efficient hardware implementation of 64-bit block cipher algorithm HIGHT (64비트 블록암호 알고리듬 HIGHT의 효율적인 하드웨어 구현)

  • Park, Hae-Won;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.9
    • /
    • pp.1993-1999
    • /
    • 2011
  • This paper describes a design of area-efficient/low-power cryptographic processor for HIGHT block cipher algorithm, which was approved as standard of cryptographic algorithm by KATS(Korean Agency for Technology and Standards) and ISO/IEC. The HIGHT algorithm, which is suitable for ubiquitous computing devices such as a sensor in USN or a RFID tag, encrypts a 64-bit data block with a 128-bit cipher key to make a 64-bit cipher text, and vice versa. For area-efficient and low-power implementation, we optimize round transform block and key scheduler to share hardware resources for encryption and decryption. The HIGHT64 core synthesized using a 0.35-${\mu}m$ CMOS cell library consists of 3,226 gates, and the estimated throughput is 150-Mbps with 80-MHz@2.5-V clock.

A LDPC decoder supporting multiple block lengths and code rates of IEEE 802.11n (다중 블록길이와 부호율을 지원하는 IEEE 802.11n용 LDPC 복호기)

  • Na, Young-Heon;Park, Hae-Won;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.6
    • /
    • pp.1355-1362
    • /
    • 2011
  • This paper describes a multi-mode LDPC decoder which supports three block lengths(648, 1296, 1944) and four code rates(1/2, 2/3, 3/4, 5/6) of IEEE 802.11n WLAN standard. Our LDPC decoder adopts a block-serial architecture based on min-sum algorithm and layered decoding scheme. A novel way to store check-node values and parity check matrix reduces the sizes of check-node memory and H-ROM. An efficient scheme for check-node memory addressing is used to achieve stall-free read/write operations. The designed LDPC decoder is verified by FPGA implementation, and synthesized with a $0.18-{\mu}m$ CMOS cell library. It has 219,100 gates and 45,036 bits RAM, and the estimated throughput is about 164~212 Mbps at 50 MHz@2.5v.

Hardware Implementation of a Fast Inter Prediction Engine for MPEG-4 AVC (MPEG-4 AVC를 위한 고속 인터 예측기의 하드웨어 구현)

  • Lim Young hun;Lee Dae joon;Jeong Yong jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.3C
    • /
    • pp.102-111
    • /
    • 2005
  • In this paper, we propose an advanced hardware architecture for the fast inter prediction engine of the video coding standard MPEG-4 AVC. We describe the algorithm and derive the hardware architecture emphasizing and real time operation of the quarter_pel based motion estimation. The fast inter prediction engine is composed of block segmentation, motion estimation, motion compensation, and the fast quarter_pel calculator. The proposed architecture has been verified by ARM-interfaced emulation board using Excalibur & Virtex2 FPGA, and also by synthesis on Samsung 0.18 um CMOS technology. The synthesis result shows that the proposed hardware can operate at 62.5MHz. In this case, it can process about 88 QCIF video frames per second. The hardware is being used as a core module when implementing a complete MPEG-4 AVC video encoder chip for real-time multimedia application.

Efficient DSP Architecture for Viterbi Algorithm (비터비 알고리즘의 효율적인 연산을 위한 DSP 구조 설계)

  • Park Weon heum;Sunwoo Myung hoon;Oh Seong keun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.3A
    • /
    • pp.217-225
    • /
    • 2005
  • This paper presents specialized DSP instructions and their architecture for the Viterbi algorithm used in various wireless communication standards. The proposed architecture can significantly reduce the Trace Back (TB) latency. The proposed instructions perform the Add Compare Select (ACS) and TB operations in parallel and the architecture has special hardware, called the Offset Calculation Unit (OCU), which automatically calculates data addresses for the trellis butterfly computations. Logic synthesis has been Performed using the Samsung SEC 0.18 μm standard cell library. OCU consists of 1,460 gates and the maximum delay of OCU is about 5.75 ns. The BER performance of the ACS-TB parallel method increases about 0.00022dB at 6dB Eb/No compared with the typical TB method, which is negligible. When the constraint length K is 5, the proposed DSP architecture can reduce the decoding cycles about 17% compared with the Carmel DSP and about 45% compared with 7MS320c15x.

Low Power Cryptographic Design based on Circuit Size Reduction (회로 크기 축소를 기반으로 하는 저 전력 암호 설계)

  • You, Young-Gap;Kim, Seung-Youl;Kim, Yong-Dae;Park, Jin-Sub
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.2
    • /
    • pp.92-99
    • /
    • 2007
  • This paper presented a low power design of a 32bit block cypher processor reduced from the original 128bit architecture. The primary purpose of this research is to evaluate physical implementation results rather than theoretical aspects. The data path and diffusion function of the processor were reduced to accommodate the smaller hardware size. As a running example demonstrating the design approach, we employed a modified ARIA algorithm having four S-boxes. The proposed 32bit ARIA processor comprises 13,893 gates which is 68.25% smaller than the original 128bit structure. The design was synthesized and verified based on the standard cell library of the MagnaChip's 0.35um CMOS Process. A transistor level power simulation shows that the power consumption of the proposed processor reduced to 61.4mW, which is 9.7% of the original 128bit design. The low power design of the block cypher Processor would be essential for improving security of battery-less wireless sensor networks or RFID.

Low-power Hardware Design of Deblocking Filter in HEVC In-loop Filter for Mobile System (모바일 시스템을 위한 저전력 HEVC 루프 내 필터의 디블록킹 필터 하드웨어 설계)

  • Park, Seungyong;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.3
    • /
    • pp.585-593
    • /
    • 2017
  • In this paper, we propose a deblocking filter hardware architecture for low-power HEVC (High-Efficiency Video Coding) in-loop for mobile systems. HEVC performs image compression on a block-by-block basis, resulting in blockage of the image due to quantization error. The deblocking filter is used to remove the blocking phenomenon in the image. Currently, UHD video service is supported in various mobile systems, but power consumption is high. The proposed low-power deblocking filter hardware structure minimizes the power consumption by blocking the clock to the internal module when the filter is not applied. It also has four parallel filter structures for high throughput at low operating frequencies and each filter is implemented in a four-stage pipeline. The proposed deblocking filter hardware structure is designed with Verilog HDL and synthesized using TSMC 65nm CMOS standard cell library, resulting in about 52.13K gates. In addition, real-time processing of 8K@84fps video is possible at 110MHz operating frequency, and operation power is 6.7mW.