• Title/Summary/Keyword: CMOS VLSI

Search Result 200, Processing Time 0.025 seconds

Experimental Characterization-Based Signal Integrity Verification of Sub-Micron VLSI Interconnects

  • Eo, Yung-Seon;Park, Young-Jun;Kim, Yong-Ju;Jeong, Ju-Young;Kwon, Oh-Kyong
    • Journal of Electrical Engineering and information Science
    • /
    • v.2 no.5
    • /
    • pp.17-26
    • /
    • 1997
  • Interconnect characterization on a wafer level was performed. Test patterns for single, two-coupled, and triple-coupled lines ere designed by using 0.5$\mu\textrm{m}$ CMOS process. Then interconnect capacitances and resistances were experimentally extracted by using tow port network measurements, Particularly to eliminate parasitic effects, the Y-parameter de-embedding was performed with specially designed de-embedding patterns. Also, for the purpose of comparisons, capacitance matrices were calculated by using the existing CAD model and field-solver-based commercial simulator, METAL and MEDICI. This work experimentally verifies that existing CAD models or parameter extraction may have large deviation from real values. The signal transient simulation with the experimental data and other methodologies such as field-solver-based simulation and existing model was performed. as expected, the significantly affect on the signal delay and crosstalk. The signal delay due to interconnects dominates the sub-micron-based a gate delay (e.g., inverter). Particularly, coupling capacitance deviation is so large (about more than 45% in the worst case) that signal integrity cannot e guaranteed with the existing methodologies. The characterization methodologies of this paper can be very usefully employed for the signal integrity verification or he electrical design rule establishments of IC interconnects in the industry.

  • PDF

Efficient QCA 2-to-4 Enable Decoder Design Based on 4-Universal Gate (4-유니버셜 게이트 기반 효율적인 QCA 2-to-4 인에이블 디코더 설계)

  • Kim, Tae-Woo;Ryu, Jung Hyuk;Jo, Jeong Hoon;Park, Jong Hyuk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.5-7
    • /
    • 2018
  • VLSI(Very large scale integration) 기술을 통한 트랜지스터의 소형화를 통해 CMOS 집적 회로의 성능은 지속적으로 발전해 왔다. 이와 같은 기술 발전에 따라 집적 회로를 구성하는 디지털 논리 요소 또한 진화를 하고 있다. 디코더는 부호화된 정보를 다시 부호화되기 전으로 되돌아가는 처리를 하는 디지털 논리 요소이며 컴퓨터 설계에서 많이 사용되는 핵심 요소이다. 본 논문에서는 양자점 셀룰라 오토마타(Quantum Cellular-Automata, QCA)를 사용하여 인에이블 입력을 가진 2-to-4 디코더를 제안하였다. 4-입력 유니버설 게이트의 하나의 입력을 1로 고정시켜 3-입력 NOR 게이트로 사용하며, 입력 값 X와 입력 값 Y의 중복된 배선 수를 감소시키고 한 배선으로 두 게이트에 입력을 연결하여 디코더의 배선 수와 배선 교차부를 최소화한다. 제안안하는 4-to-2 인에이블 디코더는 기존 디코더보다 셀의 개수와 클럭수를 감소시켜 디코더의 성능을 더 효율적으로 향상시켰다. 이를 통해 고속 회로 설계에 활용 및 높은 성능을 기대 할 수 있으며 QCA 연구에 기여할 수 있을 것으로 전망 한다.

On the Hardware Complexity of Tree Expansion in MIMO Detection

  • Kong, Byeong Yong;Lee, Youngjoo;Yoo, Hoyoung
    • Journal of Semiconductor Engineering
    • /
    • v.2 no.3
    • /
    • pp.136-141
    • /
    • 2021
  • This paper analyzes the tree expansion for multiple-input multiple-output (MIMO) detection in the viewpoint of hardware implementation. The tree expansion is to calculate path metrics of child nodes performed in every visit to a node while traversing the detection tree. Accordingly, the tree-expansion unit (TEU), which is responsible for such a task, has been an essential component in a MIMO detector. Despite the paramount importance, the analyses on the TEUs in the literature are not thorough enough. Accordingly, we further investigate the hardware complexity of the TEUs to suggest a guideline for selection. In this paper, we focus on a pair of major ways to implement the TEU: 1) a full parallel realization; 2) a transformation of the formulae followed by common subexpression elimination (CSE). For a logical comparison, the numbers of multipliers and adders are first enumerated. To evaluate them in a more practical manner, the TEUs are implemented in a 65-nm CMOS process, and their propagation delays, gate counts, and power consumptions were measured explicitly. Considering the target specification of a MIMO system and the implementation results comprehensively, one can choose which architecture to adopt in realizing a detector.

An Efficient Hardware Implementation of AES Rijndael Block Cipher Algorithm (AES Rijndael 블록 암호 알고리듬의 효율적인 하드웨어 구현)

  • 안하기;신경욱
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.12 no.2
    • /
    • pp.53-64
    • /
    • 2002
  • This paper describes a design of cryptographic processor that implements the AES (Advanced Encryption Standard) block cipher algorithm, "Rijndael". An iterative looping architecture using a single round block is adopted to minimize the hardware required. To achieve high throughput rate, a sub-pipeline stage is added by dividing the round function into two blocks, resulting that the second half of current round function and the first half of next round function are being simultaneously operated. The round block is implemented using 32-bit data path, so each sub-pipeline stage is executed for four clock cycles. The S-box, which is the dominant element of the round block in terms of required hardware resources, is designed using arithmetic circuit computing multiplicative inverse in GF($2^8$) rather than look-up table method, so that encryption and decryption can share the S-boxes. The round keys are generated by on-the-fly key scheduler. The crypto-processor designed in Verilog-HDL and synthesized using 0.25-$\mu\textrm{m}$ CMOS cell library consists of about 23,000 gates. Simulation results show that the critical path delay is about 8-ns and it can operate up to 120-MHz clock Sequency at 2.5-V supply. The designed core was verified using Xilinx FPGA board and test system.

Efficient programmable power-of-two scaler for the three-moduli set {2n+p, 2n - 1, 2n+1 - 1}

  • Taheri, MohammadReza;Navi, Keivan;Molahosseini, Amir Sabbagh
    • ETRI Journal
    • /
    • v.42 no.4
    • /
    • pp.596-607
    • /
    • 2020
  • Scaling is an important operation because of the iterative nature of arithmetic processes in digital signal processors (DSPs). In residue number system (RNS)-based DSPs, scaling represents a performance bottleneck based on the complexity of intermodulo operations. To design an efficient RNS scaler for special moduli sets, a body of literature has been dedicated to the study of the well-known moduli sets {2n - 1, 2n, 2n + 1} and {2n, 2n - 1, 2n+1 - 1}, and their extension in vertical or horizontal forms. In this study, we propose an efficient programmable RNS scaler for the arithmetic-friendly moduli set {2n+p, 2n - 1, 2n+1 - 1}. The proposed algorithm yields high speed and energy-efficient realization of an RNS programmable scaler based on the effective exploitation of the mixed-radix representation, parallelism, and a hardware sharing technique. Experimental results obtained for a 130 nm CMOS ASIC technology demonstrate the superiority of the proposed programmable scaler compared to the only available and highly effective hybrid programmable scaler for an identical moduli set. The proposed scaler provides 43.28% less power consumption, 33.27% faster execution, and 28.55% more area saving on average compared to the hybrid programmable scaler.

Hybrid FFT processor design using Parallel PD adder circuit (병렬 PD가산회로를 이용한 Hybrid FFT 연산기 설계)

  • 김성대;최전균;안점영;송홍복
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2000.10a
    • /
    • pp.499-503
    • /
    • 2000
  • The use of Multiple-Valued FFT(Fast fourier Transform) is extended from binary to multiple-valued logic(MVL) circuits. A multiple-valued FFT circuit can be implemented using current-mode CMOS techniques, reducing the transitor, wires count between devices to half compared to that of a binary implementation. For adder processing in FFT, We give the number representation using such redundant digit sets are called redundant positive-digit number representation and a Redundant set uses the carry-propagation-free addition method. As the designed Multiple-valued FFT internally using PD(positive digit) adder with the digit set 0,1,2,3 has attractive features on speed, regularity of the structure and reduced complexities of active elements and interconnections. for the mutiplier processing, we give Multiple-valued LUT(Look up table)to facilitate simple mathmatical operations on the stored digits. Finally, Multiple-valued 8point FFT operation is used as an example in this paper to illuatrates how a multiple-valued FFT can be beneficial.

  • PDF

High-Speed Reed-Solomon Decoder Using New Degree Computationless Modified Euclid´s Algorithm (새로운 DCME 알고리즘을 사용한 고속 Reed-Solomon 복호기)

  • 백재현;선우명훈
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.6
    • /
    • pp.459-468
    • /
    • 2003
  • This paper proposes a novel low-cost and high-speed Reed-Solomon (RS) decoder based on a new degree computationless modified Euclid´s (DCME) algorithm. This architecture has quite low hardware complexity compared with conventional modified Euclid´s (ME) architectures, since it can remove completely the degree computation and comparison circuits. The architecture employing a systolic away requires only the latency of 2t clock cycles to solve the key equation without initial latency. In addition, the DCME architecture using 3t+2 basic cells has regularity and scalability since it uses only one processing element. The RS decoder has been synthesized using the 0.25${\mu}{\textrm}{m}$. Faraday CMOS standard cell library and operates at 200MHz and its data rate suppots up to 1.6Gbps. For tile (255, 239, 8) RS code, the gate counts of the DCME architecture and the whole RS decoder excluding FIFO memory are only 21,760 and 42,213, respectively. The proposed RS decoder can reduce the total fate count at least 23% and the total latency at least 10% compared with conventional ME architectures.

Design of Architecture of Programmable Stack-based Video Processor with VHDL (VHDL을 이용한 프로그램 가능한 스택 기반 영상 프로세서 구조 설계)

  • 박주현;김영민
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.4
    • /
    • pp.31-43
    • /
    • 1999
  • The main goal of this paper is to design a high performance SVP(Stack based Video Processor) for network applications. The SVP is a comprehensive scheme; 'better' in the sense that it is an optimal selection of previously proposed enhancements of a stack machine and a video processor. This can process effectively object-based video data using a S-RISC(Stack-based Reduced Instruction Set Computer) with a semi -general-purpose architecture having a stack buffer for OOP(Object-Oriented Programming) with many small procedures at running programs. And it includes a vector processor that can improve the MPEG coding speed. The vector processor in the SVP can execute advanced mode motion compensation, motion prediction by half pixel and SA-DCT(Shape Adaptive-Discrete Cosine Transform) of MPEG-4. Absolutors and halfers in the vector processor make this architecture extensive to a encoder. We also designed a VLSI stack-oriented video processor using the proposed architecture of stack-oriented video decoding. It was designed with O.5$\mu\textrm{m}$ 3LM standard-cell technology, and has 110K logic gates and 12 Kbits SRAM internal buffer. The operating frequency is 50MHz. This executes algorithms of video decoding for QCIF 15fps(frame per second), maximum rate of VLBV(Very Low Bitrate Video) in MPEG-4.

  • PDF

Pipelining of orthogonal Double-Rotation Digital Lattice Filters for High-Speed and Low-Power Implementation (고속 및 저파워 실현을 위한 직교 이중 회전 디지털 격자 필터의 파이프라인화)

  • 정진균;엄경배
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.12
    • /
    • pp.2409-2417
    • /
    • 1994
  • The ODR(orthogonal double-rotation) digital lattice filters have desirable properties for VLSI implementation such as local connection, regularity and pipelinability. These filters are also known to exhibit good numerical behavior for finite precision implementation. Although these filters can be pipelined by the cut-set localization procedure, it should be noted that the maximum sample rate obtained by this technique is limited by the feedback computations. In this paper, a pipelining method for the ODR digital lattice filter is proposed, by which the sample rate can be increased at any desired level. it is also shown that the low-power CMOS digital implementation of ODR digital lattice filters can be done successfully using our pipelining method. The pipelining method is based on the properties of the Schur algoithm, constrained filter design methods, and the polyphase decomposition technique.

  • PDF

Hardware Approach to Fuzzy Inference―ASIC and RISC―

  • Watanabe, Hiroyuki
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1993.06a
    • /
    • pp.975-976
    • /
    • 1993
  • This talk presents the overview of the author's research and development activities on fuzzy inference hardware. We involved it with two distinct approaches. The first approach is to use application specific integrated circuits (ASIC) technology. The fuzzy inference method is directly implemented in silicon. The second approach, which is in its preliminary stage, is to use more conventional microprocessor architecture. Here, we use a quantitative technique used by designer of reduced instruction set computer (RISC) to modify an architecture of a microprocessor. In the ASIC approach, we implemented the most widely used fuzzy inference mechanism directly on silicon. The mechanism is beaded on a max-min compositional rule of inference, and Mandami's method of fuzzy implication. The two VLSI fuzzy inference chips are designed, fabricated, and fully tested. Both used a full-custom CMOS technology. The second and more claborate chip was designed at the University of North Carolina(U C) in cooperation with MCNC. Both VLSI chips had muliple datapaths for rule digital fuzzy inference chips had multiple datapaths for rule evaluation, and they executed multiple fuzzy if-then rules in parallel. The AT & T chip is the first digital fuzzy inference chip in the world. It ran with a 20 MHz clock cycle and achieved an approximately 80.000 Fuzzy Logical inferences Per Second (FLIPS). It stored and executed 16 fuzzy if-then rules. Since it was designed as a proof of concept prototype chip, it had minimal amount of peripheral logic for system integration. UNC/MCNC chip consists of 688,131 transistors of which 476,160 are used for RAM memory. It ran with a 10 MHz clock cycle. The chip has a 3-staged pipeline and initiates a computation of new inference every 64 cycle. This chip achieved an approximately 160,000 FLIPS. The new architecture have the following important improvements from the AT & T chip: Programmable rule set memory (RAM). On-chip fuzzification operation by a table lookup method. On-chip defuzzification operation by a centroid method. Reconfigurable architecture for processing two rule formats. RAM/datapath redundancy for higher yield It can store and execute 51 if-then rule of the following format: IF A and B and C and D Then Do E, and Then Do F. With this format, the chip takes four inputs and produces two outputs. By software reconfiguration, it can store and execute 102 if-then rules of the following simpler format using the same datapath: IF A and B Then Do E. With this format the chip takes two inputs and produces one outputs. We have built two VME-bus board systems based on this chip for Oak Ridge National Laboratory (ORNL). The board is now installed in a robot at ORNL. Researchers uses this board for experiment in autonomous robot navigation. The Fuzzy Logic system board places the Fuzzy chip into a VMEbus environment. High level C language functions hide the operational details of the board from the applications programme . The programmer treats rule memories and fuzzification function memories as local structures passed as parameters to the C functions. ASIC fuzzy inference hardware is extremely fast, but they are limited in generality. Many aspects of the design are limited or fixed. We have proposed to designing a are limited or fixed. We have proposed to designing a fuzzy information processor as an application specific processor using a quantitative approach. The quantitative approach was developed by RISC designers. In effect, we are interested in evaluating the effectiveness of a specialized RISC processor for fuzzy information processing. As the first step, we measured the possible speed-up of a fuzzy inference program based on if-then rules by an introduction of specialized instructions, i.e., min and max instructions. The minimum and maximum operations are heavily used in fuzzy logic applications as fuzzy intersection and union. We performed measurements using a MIPS R3000 as a base micropro essor. The initial result is encouraging. We can achieve as high as a 2.5 increase in inference speed if the R3000 had min and max instructions. Also, they are useful for speeding up other fuzzy operations such as bounded product and bounded sum. The embedded processor's main task is to control some device or process. It usually runs a single or a embedded processer to create an embedded processor for fuzzy control is very effective. Table I shows the measured speed of the inference by a MIPS R3000 microprocessor, a fictitious MIPS R3000 microprocessor with min and max instructions, and a UNC/MCNC ASIC fuzzy inference chip. The software that used on microprocessors is a simulator of the ASIC chip. The first row is the computation time in seconds of 6000 inferences using 51 rules where each fuzzy set is represented by an array of 64 elements. The second row is the time required to perform a single inference. The last row is the fuzzy logical inferences per second (FLIPS) measured for ach device. There is a large gap in run time between the ASIC and software approaches even if we resort to a specialized fuzzy microprocessor. As for design time and cost, these two approaches represent two extremes. An ASIC approach is extremely expensive. It is, therefore, an important research topic to design a specialized computing architecture for fuzzy applications that falls between these two extremes both in run time and design time/cost. TABLEI INFERENCE TIME BY 51 RULES {{{{Time }}{{MIPS R3000 }}{{ASIC }}{{Regular }}{{With min/mix }}{{6000 inference 1 inference FLIPS }}{{125s 20.8ms 48 }}{{49s 8.2ms 122 }}{{0.0038s 6.4㎲ 156,250 }} }}

  • PDF