• Title/Summary/Keyword: Standard cell library

Search Result 196, Processing Time 0.024 seconds

Hardware Design of High Performance HEVC Deblocking Filter for UHD Videos (UHD 영상을 위한 고성능 HEVC 디블록킹 필터 설계)

  • Park, Jaeha;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.1
    • /
    • pp.178-184
    • /
    • 2015
  • This paper proposes a hardware architecture for high performance Deblocking filter(DBF) in High Efficiency Video Coding for UHD(Ultra High Definition) videos. This proposed hardware architecture which has less processing time has a 4-stage pipelined architecture with two filters and parallel boundary strength module. Also, the proposed filter can be used in low-voltage design by using clock gating architecture in 4-stage pipeline. The segmented memory architecture solves the hazard issue that arises when single port SRAM is accessed. The proposed order of filtering shortens the delay time that arises when storing data into the single port SRAM at the pre-processing stage. The DBF hardware proposed in this paper was designed with Verilog HDL, and was implemented with 22k logic gates as a result of synthesis using TSMC 0.18um CMOS standard cell library. Furthermore, the dynamic frequency can process UHD 8k($7680{\times}4320$) samples@60fps using a frequency of 150MHz with an 8K resolution and maximum dynamic frequency is 285MHz. Result from analysis shows that the proposed DBF hardware architecture operation cycle for one process coding unit has improved by 32% over the previous one.

A single-memory based FFT/IFFT core generator for OFDM modulation/demodulation (OFDM 변복조를 위한 단일 메모리 구조의 FFT/IFFT 코어 생성기)

  • Yeem, Chang-Wan;Jeon, Heung-Woo;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.253-256
    • /
    • 2009
  • This paper describes a core generator (FFT_Core_Gen) which generates Verilog HDL models of 8 different FFT/IFFT cores with $N=64{\times}2^k$($0{\leq}k{\leq}7$ for OFDM-based communication systems. The generated FFT/IFFT cores are based on in-place single memory architecture, and use a hybrid structure of radix-4 and radix-2 DIF algorithm to accommodate various FFT lengths. To achieve both memory reduction and the improved SQNR, a conditional scaling technique is adopted, which conditionally scales the intermediate results of each computational stage, and the internal data and twiddle factor has 14 bits. The generated FFT/IFFT cores have the SQNR of 58-dB for N=8,192 and 63-dB for N=64. The cores synthesized with a $0.35-{\mu}m$ CMOS standard cell library can operate with 75-MHz@3.3-V, and a 8,192-point FFT can be computed in $762.7-{\mu}s$, thus the cores satisfy the specifications of wireless LAN, DMB, and DVB systems.

  • PDF

Fast RSA Montgomery Multiplier and Its Hardware Architecture (고속 RSA 하드웨어 곱셈 연산과 하드웨어 구조)

  • Chang, Nam-Su;Lim, Dae-Sung;Ji, Sung-Yeon;Yoon, Suk-Bong;Kim, Chang-Han
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.17 no.1
    • /
    • pp.11-20
    • /
    • 2007
  • A fast Montgomery multiplication occupies important to the design of RSA cryptosystem. Montgomery multiplication consists of two addition, which calculates using CSA or RBA. In terms of CSA, the multiplier is implemented using 4-2 CSA o. 5-2 CSA. In terms of RBA, the multiplier is designed based on redundant binary system. In [1], A new redundant binary adder that performs the addition between two binary signed-digit numbers and apply to Montgomery multiplier was proposed. In this paper, we reconstruct the logic structure of the RBA in [1] for reducing time and space complexity. Especially, the proposed RB multiplier has no coupler like the RBA in [1]. And the proposed RB multiplier is suited to binary exponentiation as modified input and output forms. We simulate to the proposed NRBA using gates provided from SAMSUNG STD130 $0.18{\mu}m$ 1.8V CMOS Standard Cell Library. The result is smaller by 18.5%, 6.3% and faster by 25.24%, 14% than 4-2 CSA, existing RBA, respectively. And Especially, the result is smaller by 44.3% and faster by 2.8% than the RBA in [1].

Implementation of High-radix Modular Exponentiator for RSA using CRT (CRT를 이용한 하이래딕스 RSA 모듈로 멱승 처리기의 구현)

  • 이석용;김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.4
    • /
    • pp.81-93
    • /
    • 2000
  • In a methodological approach to improve the processing performance of modulo exponentiation which is the primary arithmetic in RSA crypto algorithm, we present a new RSA hardware architecture based on high-radix modulo multiplication and CRT(Chinese Remainder Theorem). By implementing the modulo multiplier using radix-16 arithmetic, we reduced the number of PE(Processing Element)s by quarter comparing to the binary arithmetic scheme. This leads to having the number of clock cycles and the delay of pipelining flip-flops be reduced by quarter respectively. Because the receiver knows p and q, factors of N, it is possible to apply the CRT to the decryption process. To use CRT, we made two s/2-bit multipliers operating in parallel at decryption, which accomplished 4 times faster performance than when not using the CRT. In encryption phase, the two s/2-bit multipliers can be connected to make a s-bit linear multiplier for the s-bit arithmetic operation. We limited the encryption exponent size up to 17-bit to maintain high speed, We implemented a linear array modulo multiplier by projecting horizontally the DG of Montgomery algorithm. The H/W proposed here performs encryption with 15Mbps bit-rate and decryption with 1.22Mbps, when estimated with reference to Samsung 0.5um CMOS Standard Cell Library, which is the fastest among the publications at present.

Implementation of RSA modular exponentiator using Division Chain (나눗셈 체인을 이용한 RSA 모듈로 멱승기의 구현)

  • 김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.12 no.2
    • /
    • pp.21-34
    • /
    • 2002
  • In this paper we propos a new hardware architecture of modular exponentiation using a division chain method which has been proposed in (2). Modular exponentiation using the division chain is performed by receding an exponent E as a mixed form of multiplication and addition with divisors d=2 or $d=2^I +1$ and respective remainders r. This calculates the modular exponentiation in about $1.4log_2$E multiplications on average which is much less iterations than $2log_2$E of conventional Binary Method. We designed a linear systolic array multiplier with pipelining and used a horizontal projection on its data dependence graph. So, for k-bit key, two k-bit data frames can be inputted simultaneously and two modular multipliers, each consisting of k/2+3 PE(Processing Element)s, can operate in parallel to accomplish 100% throughput. We propose a new encoding scheme to represent divisors and remainders of the division chain to keep regularity of the data path. When it is synthesized to ASIC using Samsung 0.5 um CMOS standard cell library, the critical path delay is 4.24ns, and resulting performance is estimated to be abort 140 Kbps for a 1024-bit data frame at 200Mhz clock In decryption process, the speed can be enhanced to 560kbps by using CRT(Chinese Remainder Theorem). Futhermore, to satisfy real time requirements we can choose small public exponent E, such as 3,17 or $2^{16} +1$, in encryption and verification process. in which case the performance can reach 7.3Mbps.

A Novel Redundant Binary Montgomery Multiplier and Hardware Architecture (새로운 잉여 이진 Montgomery 곱셈기와 하드웨어 구조)

  • Lim Dae-Sung;Chang Nam-Su;Ji Sung-Yeon;Kim Sung-Kyoung;Lee Sang-Jin;Koo Bon-Seok
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.16 no.4
    • /
    • pp.33-41
    • /
    • 2006
  • RSA cryptosystem is of great use in systems such as IC card, mobile system, WPKI, electronic cash, SET, SSL and so on. RSA is performed through modular exponentiation. It is well known that the Montgomery multiplier is efficient in general. The critical path delay of the Montgomery multiplier depends on an addition of three operands, the problem that is taken over carry-propagation makes big influence at an efficiency of Montgomery Multiplier. Recently, the use of the Carry Save Adder(CSA) which has no carry propagation has worked McIvor et al. proposed a couple of Montgomery multiplication for an ideal exponentiation, the one and the other are made of 3 steps and 2 steps of CSA respectively. The latter one is more efficient than the first one in terms of the time complexity. In this paper, for faster operation than the latter one we use binary signed-digit(SD) number system which has no carry-propagation. We propose a new redundant binary adder(RBA) that performs the addition between two binary SD numbers and apply to Montgomery multiplier. Instead of the binary SD addition rule using in existing RBAs, we propose a new addition rule. And, we construct and simulate to the proposed adder using gates provided from SAMSUNG STD130 $0.18{\mu}m$ 1.8V CMOS Standard Cell Library. The result is faster by a minimum 12.46% in terms of the time complexity than McIvor's 2 method and existing RBAs.