• Title/Summary/Keyword: Standard cell library

Search Result 196, Processing Time 0.031 seconds

Design of SIMD-DSP/PPU for a High-Performance Embedded Microprocessor (고성능 내장형 마이크로프로세서를 위한 SIMD-DSP/FPU의 설계)

  • 정우경;홍인표;이용주;이용석
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.4C
    • /
    • pp.388-397
    • /
    • 2002
  • We designed a SIMD-DSP/FPU that can efficiently improve multimedia processing performance when integrated into high-performance embedded microprocessors. We proposed partitioned architectures and new schemes for several functional units to reduce chip area. Sharing functional units reduces the area of FPU significantly. The proposed architecture is modeled in HDL and synthesized with a 0.35$\mu\textrm{m}$ standard cell library. The chip area is estimated to be about 100,000 equivalent gates. The designed unit can run at higher than 50MHz clock frequency of CPU core under the worst-case operating conditions.

A study on a CMOS analog cell-library design-A CMOS on-chip current reference circuit (CMOS 아날로그 셀 라이브레이 설계에 관한 연구-CMOS 온-칩 전류 레퍼런스 회로)

  • 김민규;이승훈;임신일
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.33A no.4
    • /
    • pp.136-141
    • /
    • 1996
  • In this paper, a new CMOS on-chip current reference circit for memory, operational amplifiers, comparators, and data converters is proposed. The reference current is almost independent of temeprature and power-supply variations. In the proposed circuit, the current component with a positive temeprature coefficient cancels that with a negative temperature coefficient each other. While conventional curretn and voltage reference circuits require BiCMOS or bipolar process, the presented circuit can be integrated on a single chip with other digiral and analog circits using a standard CMOS process and an extra mask is not needed. The prototype is fabricated employing th esamsung 1.0um p-well double-poly double-metal CMOS process and the chip area is 300um${\times}$135 um. The proposed reference current circuit shows the temperature coefficient of 380 ppm/.deg. C with the temperature changes form 30$^{\circ}C$ to 80$^{\circ}C$, and the output variation of $\pm$ 1.4% with the supply voltage changes from 4.5 V to 5.5 V.

  • PDF

Design and Implementation of the SoC for Terrestrial DMB Receiver (지상파 DMB 수신용 SoC 설계 및 구현)

  • Koo, Bon-Tae;Lee, Ju-Hyeon;Choe, Min-Seok;Lee, Seok-Ho;Kim, Jin-Gyu;Kim, Seong-Min;Park, Gi-Hyeok;Kim, Deok-Hwan;Gwon, Yeong-Su;Eom, Nak-Ung
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.669-670
    • /
    • 2006
  • This paper describes the functions and design technology of the T-DMB (Terrestrial Digital Multimedia Broadcasting) receiver. T-DMB is a novel broadcasting media that can provide high-quality video and audio services. In this paper, we will describe the VLSI implementation of RF, Baseband and Multimedia Chip for T-DMB Receiver. The designed DMB SoC has low power consumption and has been implemented using a standard-cell library in 0.18um CMOS technology.

  • PDF

A VLSI Design for Digital Pre-distortion with Pipelined CORDIC Processors

  • Park, Jong Kang;Moon, Jun Young;Kim, Kyunghoon;Yang, Youngoo;Kim, Jong Tae
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.6
    • /
    • pp.718-727
    • /
    • 2014
  • In a wireless communications system, a predistorter is often used to compensate for the nonlinear distortions that result from operating a power amplifier near the saturation region, thereby improving system performance and increasing the spectral efficiency for the communication channels. This paper presents a new VLSI design for the polynomial digital predistorter (DPD). The proposed DPD uses a Coordinate Rotation Digital Computing (CORDIC) processor and a PD process with a fully-pipelined architecture. Due to its simple and regular structure, it can be a competitive design when compared to existing polynomial-type and approximated DPDs. Implementing a fifth-order distorter with the proposed design requires only 43,000 logic gates in a $0.35{\mu}m$ CMOS standard cell library.

Combination of Gate Sizing and Buffer Insertion Methods to Reduce Glitch Power Dissipation (글리치 전력소모감소를 위한 게이트 사이징과 버퍼삽입 혼합기섭)

  • Kim, Seong-Jae;Lee, Hyeong-U;Kim, Ju-Ho
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.8
    • /
    • pp.406-413
    • /
    • 2001
  • 본 논문은 CMOS 디지털 회로에서 글리치(glitch)에 의해 발생하는 전력소모를 줄이기 위한 효율적인 휴리스틱 알고리즘을 제시한다. 제안된 알고리즘은 사이징되는 게이트의 위치와 양에 따라 게이트 사이징을 세 가지 type으로 분류한다. 또한 버퍼삽입은 삽입되는 버퍼의 위치에 따라서 두 가지 type으로 분류한다. 글리치 제거 효과를 극대화하기 위해서 비용과 이득의 상관관계를 고려하여 하나의 최적화 과정 안에서 세 가지 type의 게이트 사이징과 두 가지 type의 버퍼삽입을 혼합한다. 제안된 알고리즘은 0.5$\mu\textrm{m}$ 표준 셀 라이브러리(standard cell library)를 이용한 LGSynth91 벤치마크 회로에 대한 테스트 결과 효율성을 검증하였다. 실험결과는 평균적으로 69.98%의 글리치 감소와 28.69%의 전력감소를 얻을 수 있었으며 이것은 독립적으로 적용된 게이트 사이징과 버퍼 삽입 알고리즘에 의한 것 보다 좋은 결과이다.

  • PDF

Design of Pipelined Floating-Point Arithmetic Unit for Mobile 3D Graphics Applications

  • Choi, Byeong-Yoon;Ha, Chang-Soo;Lee, Jong-Hyoung;Salclc, Zoran;Lee, Duck-Myung
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.6
    • /
    • pp.816-827
    • /
    • 2008
  • In this paper, two-stage pipelined floating-point arithmetic unit (FP-AU) is designed. The FP-AU processor supports seventeen operations to apply 3D graphics processor and has area-efficient and low-latency architecture that makes use of modified dual-path computation scheme, new normalization circuit, and modified compound adder based on flagged prefix adder. The FP-AU has about 4-ns delay time at logic synthesis condition using $0.18{\mu}m$ CMOS standard cell library and consists of about 5,930 gates. Because it has 250 MFLOPS execution rate and supports saturated arithmetic including a number of graphics-oriented operations, it is applicable to mobile 3D graphics accelerator efficiently.

  • PDF

A Design of Multi-Standard LDPC Decoder for WiMAX/WLAN (WiMAX/WLAN용 다중표준 LDPC 복호기 설계)

  • Seo, Jin-Ho;Park, Hae-Won;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.2
    • /
    • pp.363-371
    • /
    • 2013
  • This paper describes a multi-standard LDPC decoder which supports 19 block lengths(576~2304) and 6 code rates(1/2, 2/3A, 2/3B, 3/4A, 3/4B, 5/6) of IEEE 802.16e mobile WiMAX standard and 3 block lengths(648, 1296, 1944) and 4 code rates(1/2, 2/3, 3/4, 5/6) of IEEE 802.11n WLAN standard. To minimize hardware complexity, it adopts a block-serial (partially parallel) architecture based on the layered decoding scheme. A DFU(decoding function unit) based on sign-magnitude arithmetic is used for hardware reduction. The designed LDPC decoder is verified by FPGA implementation, and synthesized with a 0.13-${\mu}m$ CMOS cell library. It has 312,000 gates and 70,000 bits RAM. The estimated throughput is about 79~210 Mbps at 100 MHz@1.8v.

Design of Intra Prediction Circuit for HEVC and H.264 Multi-decoder Supporting UHD Images (UHD 영상을 지원하는 HEVC 및 H.264 멀티 디코더 용 인트라 예측 회로 설계)

  • Yu, Sanghyun;Cho, Kyeongsoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.12
    • /
    • pp.50-56
    • /
    • 2016
  • This paper proposes the architecture and design of intra prediction circuit for a multi-decoder supporting UHD images. The proposed circuit supports not only the latest video compression standard HEVC but also H.264. In addition to the basic function of performing intra prediction, this circuit has the capability of performing the reference sample filter operation defined in the H.264 standard, and the smoothing and strong sample filter operations defined in the HEVC standard. We reduced the circuit size by sharing the circuit blocks for common operations and internal storage, and improved the circuit performance by parallel processing. The proposed circuit was described at RTL using Verilog HDL and its functionality was verified by using NC-Verilog of Cadence. The RTL circuit was synthesized by using Design Compiler of Synopsys and 130nm standard cell library. The synthesized gate-level circuit consists of 69,694 gates and processes 100 ~ 280 frames per second for 4K-UHD HEVC images at the maximum operation frequency of 157MHz.

Study of Optimization for High Performance Adders (고성능 가산기의 최적화 연구)

  • 허석원;김문경;이용주;이용석
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.5A
    • /
    • pp.554-565
    • /
    • 2004
  • In this paper, we implement single cycle and multi cycle adders. We can compare area and time by using the implemented adders. The size of adders is 64, 128, 256-bits. The architecture of hybrid adders is that the carry-out of small adder groups can be interconnected by utilizing n carry propagate unit. The size of small adder groups is selected in three formats - 4, 8, 16-bits. These adders were implemented with Verilog HDL with top-down methodology, and they were verified by behavioral model. The verified models were synthesized with a Samsung 0,35(um), 3.3(V) CMOS standard cell library while a using Synopsys Design Compiler. All adders were synthesized with group or ungroup. The optimized adder for a Crypto-processor included Smart Card IC is that a 64-bit RCA based on 16-bit CLA. All small adder groups in this optimized adder were synthesized with group. This adder can operate at a clock speed of 198 MHz and has about 961 gates. All adders can execute operations in this won case conditions of 2.7 V, 85 $^{\circ}C$.

Design of an Optimal RSA Crypto-processor for Embedded Systems (내장형 시스템을 위한 최적화된 RSA 암호화 프로세서 설계)

  • 허석원;김문경;이용석
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.4A
    • /
    • pp.447-457
    • /
    • 2004
  • This paper proposes a RSA crypto-processor for embedded systems. The architecture of the RSA crypto-processor should be used relying on Big Montgomery algorithm, and is supported by configurable bit size. The RSA crypto-processor includes a RSA control signal generator, an optimal Big Montgomery processor(adder, multiplier). We use diverse arithmetic unit (adder, multiplier) algorithm. After we compared the various results, we selected the optimal arithmetic unit which can be connected with ARM core-processor. The RSA crypto-processor was implemented with Verilog HDL with top-down methodology, and it was verified by C language and Cadence Verilog-XL. The verified models were synthesized with a Hynix 0.25${\mu}{\textrm}{m}$, CMOS standard cell library while using Synopsys Design Compiler. The RSA crypto-processor can operate at a clock speed of 51 MHz in this worst case conditions of 2.7V, 10$0^{\circ}C$ and has about 36,639 gates.