• Title/Summary/Keyword: Standard cell library

Search Result 196, Processing Time 0.021 seconds

A Study on the Design of a RISC core with DSP Support (DSP기능을 강화한 RISC 프로세서 core의 ASIC 설계 연구)

  • 김문경;정우경;이용석;이광엽
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.11C
    • /
    • pp.148-156
    • /
    • 2001
  • This paper proposed embedded application-specific microprocessor(YS-RDSP) whose structure has an additional DSP processor on chip. The YS-RDSP can execute maximum four instructions in parallel. To make program size shorter, 16-bit and 32-bit instruction lengths are supported in YS-RDSP. The YS-RDSP provides programmability. controllability, DSP processing ability, and includes eight-kilobyte on-chip ROM and eight-kilobyte RAM. System controller on the chip gives three power-down modes for low-power operation, and SLEEP instruction changes operation statue of CPU core and peripherals. YS-RDSP processor was implemented with Verilog HDL on top-down methodology, and it was improved and verified by cycle-based simulator written in C-language. The verified model was synthesized with 0.7um, 3.3V CMOS standard cell library, and the layout size was 10.7mm78.4mm which was implemented by using automatic P&R software.

  • PDF

Code Rate 1/2, 2304-b LDPC Decoder for IEEE 802.16e WiMAX (IEEE 802.16e WiMAX용 부호율 1/2, 2304-비트 LDPC 복호기)

  • Kim, Hae-Ju;Shin, Kyung-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.4A
    • /
    • pp.414-422
    • /
    • 2011
  • This paper describes a design of low-density parity-check(LDPC) decoder supporting block length 2,304-bit and code rate 1/2 of IEEE 802.16e mobile WiMAX standard. The designed LDPC decoder employs the min-sum algorithm and partially parallel layered-decoding architecture which processes a sub-matrix of $96{\times}96$ in parallel. By exploiting the properties of the min-sum algorithm, a new memory reduction technique is proposed, which reduces check node memory by 46% compared to conventional method. Functional verification results show that it has average bit-error-rate(BER) of $4.34{\times}10^{-5}$ for AWGN channel with Fb/No=2.1dB. Our LDPC decoder synthesized with a $0.18{\mu}m$ CMOS cell library has 174,181 gates and 52,992 bits memory, and the estimated throughput is about 417 Mbps at 100-MHz@l.8-V.

Design of Low-Power and Low-Complexity MIMO-OFDM Baseband Processor for High Speed WLAN Systems (고속 무선 LAN 시스템을 위한 저전력/저면적 MIMO-OFDM 기저대역 프로세서 설계)

  • Im, Jun-Ha;Cho, Mi-Suk;Jung, Yun-Ho;Kim, Jae-Seok
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.11C
    • /
    • pp.940-948
    • /
    • 2008
  • This paper presents a low-power, low-complexity design and implementation results of a high speed multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) wireless LAN (WLAN) baseband processor. The proposed processor is composed of the physical layer convergence procedure (PLCP) processor and physical medium dependent (PMD) processor, which have been optimized to have low-power and reduced-complexity architecture. It was designed in a hardware description language (HDL) and synthesized to gate-level circuits using 0.18um CMOS standard cell library. As a result, the proposed TX-PLCP processor reduced the power consumption by as much as 81% over the bit-level operation architecture. Also, the proposed MIMO symbol detector reduced the hardware complexity by 18% over the conventional SQRD-based architecture with division circuits and square root operations.

A VLSI Architecture of Systolic Array for FET Computation (고속 퓨리어 변환 연산용 VLSI 시스토릭 어레이 아키텍춰)

  • 신경욱;최병윤;이문기
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.9
    • /
    • pp.1115-1124
    • /
    • 1988
  • A two-dimensional systolic array for fast Fourier transform, which has a regular and recursive VLSI architecture is presented. The array is constructed with identical processing elements (PE) in mesh type, and due to its modularity, it can be expanded to an arbitrary size. A processing element consists of two data routing units, a butterfly arithmetic unit and a simple control unit. The array computes FFT through three procedures` I/O pipelining, data shuffling and butterfly arithmetic. By utilizing parallelism, pipelining and local communication geometry during data movement, the two-dimensional systolic array eliminates global and irregular commutation problems, which have been a limiting factor in VLSI implementation of FFT processor. The systolic array executes a half butterfly arithmetic based on a distributed arithmetic that can carry out multiplication with only adders. Also, the systolic array provides 100% PE activity, i.e., none of the PEs are idle at any time. A chip for half butterfly arithmetic, which consists of two BLC adders and registers, has been fabricated using a 3-um single metal P-well CMOS technology. With the half butterfly arithmetic execution time of about 500 ns which has been obtained b critical path delay simulation, totla FFT execution time for 1024 points is estimated about 16.6 us at clock frequency of 20MHz. A one-PE chip expnsible to anly size of array is being fabricated using a 2-um, double metal, P-well CMOS process. The chip was layouted using standard cell library and macrocell of BLC adder with the aid of auto-routing software. It consists of around 6000 transistors and 68 I/O pads on 3.4x2.8mm\ulcornerarea. A built-i self-testing circuit, BILBO (Built-In Logic Block Observation), was employed at the expense of 3% hardware overhead.

  • PDF

MPW Chip Implementation and Verification of High-performance Vector Inner Product Calculation Circuit for SVM-based Object Recognition (SVM 기반 사물 인식을 위한 고성능 벡터 내적 연산 회로의 MPW 칩 구현 및 검증)

  • Shin, Jaeho;Kim, Soojin;Cho, Kyeongsoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.11
    • /
    • pp.124-129
    • /
    • 2013
  • This paper proposes a high-performance vector inner product calculation circuit for real-time object recognition based on SVM algorithm. SVM algorithm shows a higher detection rate than other object recognition algorithms. However, it requires a huge amount of computational efforts. Since vector inner product calculation is one of the major operations of SVM algorithm, it is important to implement a high-performance vector inner product calculation circuit for real-time object recognition capability. The proposed circuit adopts the pipeline architecture with six stages to increase the operating speed and makes it possible to recognize objects in real time based on SVM. The proposed circuit was described in Verilog HDL at RTL. For silicon verification, an MPW chip was fabricated using TSMC 180nm standard cell library. The operation of the implemented MPW chip was verified on the test board with test application software developed for the chip verification.

Parallel Architecture Design of H.264/AVC CAVLC for UD Video Realtime Processing (UD(Ultra Definition) 동영상 실시간 처리를 위한 H.264/AVC CAVLC 병렬 아키텍처 설계)

  • Ko, Byung Soo;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.5
    • /
    • pp.112-120
    • /
    • 2013
  • In this paper, we propose high-performance H.264/AVC CAVLC encoder for UD video real time processing. Statistical values are obtained in one cycle through the parallel arithmetic and logical operations, using non-zero bit stream which represents zero coefficient or non-zero coefficient. To encode codeword per one cycle, we remove recursive operation in level encoding through parallel comparison for coefficient and escape value. In oder to implement high-speed circuit, proposed CAVLC encoder is designed in two-stage {statical scan, codeword encoding} pipeline. Reducing the encoding table, the arithmetic unit is used to encode non-coefficient and to calculate the codeword. The proposed architecture was simulated in 0.13um standard cell library. The gate count is 33.4Kgates. The architecture can support Ultra Definition Video ($3840{\times}2160$) at 100 frames per second by running at 100MHz.

Design of a Realtime Stereo Vision System using Adaptive Support-weight (적응적 영역 가중치를 이용한 실시간 스테레오 비전 시스템 설계)

  • Ryu, Donghoon;Park, Taegeun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.11
    • /
    • pp.90-98
    • /
    • 2013
  • The stereo system based on local matching is very popular due to its algorithmic simplicity, however it is limited to apply to various applications because it shows poor quality with low matching rates. In this paper, we propose and design a realtime stereo system based on an adaptive support-weight and the system shows low error rates and realtime performance. Generally, in the adaptive support-weight algorithm the intermediate computing results can not be reused to reduce the number of computations. In this research we modify the scheduling to reuse the intermediate results for the better performance by processing rows and columns separately. The nonlinear functions such as exponential or arc tangent have been designed with piecewise linear and step functions by empirical simulations and error analysis. The proposed architecture is composed of 9 processing elements for realtime performance. The proposed stereo system has been designed and synthesized using Donbu Hitek 0.18um standard cell library and can run up to 350Mhz operation frequency (33 frames per second) with 424K gates.

ECC Processor Supporting NIST Elliptic Curves over GF(2m) (GF(2m) 상의 NIST 타원곡선을 지원하는 ECC 프로세서)

  • Lee, Sang-Hyun;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.190-192
    • /
    • 2018
  • This paper describes a design of an elliptic curve cryptography (ECC) processor that supports five pseudo-random curves and five Koblitz curves over binary field defined by the NIST standard. The ECC processor adopts the Lopez-Dahab projective coordinate system so that scalar multiplication is computed with modular multiplier and XORs. A word-based Montgomery multiplier of $32-b{\times}32-b$ was designed to implement ECCs of various key lengths using fixed-size hardware. The hardware operation of the ECC processor was verified by FPGA implementation. The ECC processor synthesized using a 0.18-um CMOS cell library occupies 10,674 gate equivalents (GEs) and 9 Kbits RAM at 100 MHz, and the estimated maximum clock frequency is 154 MHz.

  • PDF

Hardware Implementation of Chaotic System for Security of JPEG2000 (JPEG2000의 보안을 위한 카오스 시스템의 하드웨어 구현)

  • Seo Young-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.12C
    • /
    • pp.1193-1200
    • /
    • 2005
  • In this paper, we proposed an image hiding method which decreases the amount of calculation encrypting partial data rather than the whole image data using a discrete wavelet transform and a linear scalar quantization which have been adopted as the main technique in JPEG2000 standard and then implemented the proposed algorithm to hardware. A chaotic system was used instead of encryption algorithms to reduce further amount of calculation. It uses a method of random changing method using the chaotic system of the data in a selected subband. For ciphering the quantization index it uses a novel image encryption algorithm of cyclical shifting to the right or left direction and encrypts two quantization assignment method (Top-down coding and Reflection coding), made change of data less. The experiments have been performed with the proposed methods implemented in software for about 500 images. The hardware encryption system was synthesized to find the gate-level circuit with the Samsung $0.35{\mu}m$ Phantom-cell library and timing simulation was performed, which resulted in the stable operation in the frequency above 100MHz.

Design of a GFAU(Galois Field Arithmetic Unit) in (GF(2m)에서의 사칙연산을 수행하는 GFAU의 설계GF(2m))

  • Kim, Moon-Gyung;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.2A
    • /
    • pp.80-85
    • /
    • 2003
  • This paper proposes Galois Field Arithmetic Unit(GFAU) whose structure does addition, multiplication and division in GF(2m). GFAU can execute maximum two additions, or two multiplications, or one addition and one multiplication. The base architecture of this GFAU is a divider based on modified Euclid's algorithm. The divider was modified to enable multiplication and addition, and the modified divider with the control logic became GFAU. The GFAU for GF(2193) was implemented with Verilog HDL with top-down methodology, and it was improved and verified by a cycle-based simulator written in C-language. The verified model was synthesized with Samsung 0.35um, 3.3V CMOS standard cell library, and it operates at 104.7MHz in the worst case of 3.0V, 85$^{\circ}C$, and it has about 25,889 gates.