• Title/Summary/Keyword: clock scheme

Search Result 213, Processing Time 0.032 seconds

A High-Performance ECC Processor Supporting NIST P-521 Elliptic Curve (NIST P-521 타원곡선을 지원하는 고성능 ECC 프로세서)

  • Yang, Hyeon-Jun;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.4
    • /
    • pp.548-555
    • /
    • 2022
  • This paper describes the hardware implementation of elliptic curve cryptography (ECC) used as a core operation in elliptic curve digital signature algorithm (ECDSA). The ECC processor supports eight operation modes (four point operations, four modular operations) on the NIST P-521 curve. In order to minimize computation complexity required for point scalar multiplication (PSM), the radix-4 Booth encoding scheme and modified Jacobian coordinate system were adopted, which was based on the complexity analysis for five PSM algorithms and four different coordinate systems. Modular multiplication was implemented using a modified 3-Way Toom-Cook multiplication and a modified fast reduction algorithm. The ECC processor was implemented on xczu7ev FPGA device to verify hardware operation. Hardware resources of 101,921 LUTs, 18,357 flip-flops and 101 DSP blocks were used, and it was evaluated that about 370 PSM operations per second were achieved at a maximum operation clock frequency of 45 MHz.

Branch Prediction Latency Hiding Scheme using Branch Pre-Prediction and Modified BTB (분기 선예측과 개선된 BTB 구조를 사용한 분기 예측 지연시간 은폐 기법)

  • Kim, Ju-Hwan;Kwak, Jong-Wook;Jhon, Chu-Shik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.1-10
    • /
    • 2009
  • Precise branch predictor has a profound impact on system performance in modern processor architectures. Recent works show that prediction latency as well as prediction accuracy has a critical impact on overall system performance as well. However, prediction latency tends to be overlooked. In this paper, we propose Branch Pre-Prediction policy to tolerate branch prediction latency. The proposed solution allows that branch predictor can proceed its prediction without any information from the fetch engine, separating the prediction engine from fetch stage. In addition, we propose newly modified BTE structure to support our solution. The simulation result shows that proposed solution can hide most prediction latency with still providing the same level of prediction accuracy. Furthermore, the proposed solution shows even better performance than the ideal case, that is the predictor which always takes a single cycle prediction latency. In our experiments, IPC improvement is up to 11.92% and 5.15% in average, compared to conventional predictor system.

A Frequency Synthesizer for MB-OFDM UWB with Fine Resolution VCO Tuning Scheme (고 해상도 VCO 튜닝 기법을 이용한 MB-OFDM UWB용 주파수 합성기)

  • Park, Joon-Sung;Nam, Chul;Kim, Young-Shin;Pu, Young-Gun;Hur, Jeong;Lee, Kang-Yoon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.8
    • /
    • pp.117-124
    • /
    • 2009
  • This paper describes a 3 to 5 GHz frequency synthesizer for MB-OFDM (Multi-Band OFDM) UWB (Ultra- Wideband) application using 0.13 ${\mu}m$ CMOS process. The frequency synthesizer operates in the band group 1 whose center frequencies are 3432 MHz 3960 MHz, and 4488 MHz. To cover the overall frequencies of group 1, an efficient frequency planning minimizing a number of blocks and the power consumption are proposed. And, a high-frequency VCO and LO Mixer architecture are also presented in this paper. A new mixed coarse tuning scheme that utilizes the MIM capacitance, the varactor arrays, and the DAC is proposed to expand the VCO tuning range. The frequency synthesizer can also provide the clock for the ADC in baseband modem. So, the PLL for the ADC in the baseband modem can be removed with this frequency synthesizer. The single PLL and two SSB-mixers consume 60 mW from a 1.2 sV supply. The VCO tuning range is 1.2 GHz. The simulated phase noise of the VCO is -112 dBc/Hz at 1 MHz offset. The die area is 2 ${\times}$ 2mm$^2$.

A Study on Efficient Cell Queueing and Scheduling Algorithms for Multimedia Support in ATM Switches (ATM 교환기에서 멀티미디어 트래픽 지원을 위한 효율적인 셀 큐잉 및 스케줄링 알고리즘에 관한 연구)

  • Park, Jin-Su;Lee, Sung-Won;Kim, Young-Beom
    • Journal of IKEEE
    • /
    • v.5 no.1 s.8
    • /
    • pp.100-110
    • /
    • 2001
  • In this paper, we investigated several buffer management schemes for the design of shared-memory type ATM switches, which can enhance the utilization of switch resources and can support quality-of-service (QoS) functionalities. Our results show that dynamic threshold (DT) scheme demonstrate a moderate degree of robustness close to pushout(PO) scheme, which is known to be impractical in the perspective of hardware implementation, under various traffic conditions such as traffic loads, burstyness of incoming traffic, and load non-uniformity across output ports. Next, we considered buffer management strategies to support QoS functions, which utilize parameter values obtained via connection admission control (CAC) procedures to set tile threshold values. Through simulations, we showed that the buffer management schemes adopted behave well in the sense that they can protect regulated traffic from unregulated cell traffic in allocating buffer space. In particular, it was observed that dynamic partitioning is superior in terms of QoS support than virtual partitioning.

  • PDF

Design and Hardware Implementation of High-Speed Variable-Length RSA Cryptosystem (가변길이 고속 RSA 암호시스템의 설계 및 하드웨어 구현)

  • 박진영;서영호;김동욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.9C
    • /
    • pp.861-870
    • /
    • 2002
  • In this paper, with targeting on the drawback of RSA of operation speed, a new 1024-bit RSA cryptosystem has been proposed and implemented in hardware to increase the operational speed and perform the variable-length encryption. The proposed cryptosystem mainly consists of the modular exponentiation part and the modular multiplication part. For the modular exponentiation, the RL-binary method, which performs squaring and modular multiplying in parallel, was improved, and then applied. And 4-stage CSA structure and radix-4 booth algorithm were applied to enhance the variable-length operation and reduce the number of partial product in modular multiplication arithmetic. The proposed RSA cryptosystem which can calculate at most 1024 bits at a tittle was mapped into the integrated circuit using the Hynix Phantom Cell Library for Hynix 0.35㎛ 2-Poly 4-Metal CMOS process. Also, the result of software implementation, which had been programmed prior to the hardware research, has been used to verify the operation of the hardware system. The size of the result from the hardware implementation was about 190k gate count and the operational clock frequency was 150㎒. By considering a variable-length of modulus number, the baud rate of the proposed scheme is one and half times faster than the previous works. Therefore, the proposed high speed variable-length RSA cryptosystem should be able to be used in various information security system which requires high speed operation.

New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorithm (Radix-2 MBA 기반 병렬 MAC의 VLSI 구조)

  • Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.4
    • /
    • pp.94-104
    • /
    • 2008
  • In this paper, we propose a new architecture of multiplier-and-accumulator (MAC) for high speed multiplication and accumulation arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator which has the largest delay in MAC was removed and its function was included into CSA, the overall performance becomes to be elevated. The proposed CSA tree uses 1's complement-based radix-2 modified booth algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of operands. The CSA propagates the carries by the least significant bits of the partial products and generates the least significant bits in advance for decreasing the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits not the output of the final adder for improving the performance by optimizing the efficiency of pipeline scheme. The proposed architecture was synthesized with $250{\mu}m,\;180{\mu}m,\;130{\mu}m$ and 90nm standard CMOS library after designing it. We analyzed the results such as hardware resource, delay, and pipeline which are based on the theoretical and experimental estimation. We used Sakurai's alpha power low for the delay modeling. The proposed MAC has the superior properties to the standard design in many ways and its performance is twice as much than the previous research in the similar clock frequency.

Design of Digital PLL with Asymmetry Compensator in High Speed DVD Systems (고속 DVD 시스템에서 비대칭 신호 보정기와 결합한 Digital PLL 설계)

  • 김판수;고석준;최형진;이정현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.12A
    • /
    • pp.2000-2011
    • /
    • 2001
  • In this Paper, we convert conventional low speed(1x, 6x) DVD systems designed by analog PLL(Phase Locked Loop) into digital PLL to operate at high speed systems flexibly, and present optimal DPLL model in high speed(20x) DVD systems. Especially, we focused on the design of DPLL that can overcome channel effects such as bulk delay, sampling clock frequency offset and asymmetry phenomenon in high speed DVD systems. First, the modified Early-Late timing error detector as digital timing recovery scheme is proposed. And the four-sampled compensation algorithm using zero crossing point as asymmetry compensator is designed to achieve high speed operation and strong reliability. We show that the proposed timing recovery algorithm provides enhanced performances in jitter valiance and SNR margin by 4 times and 3dB respectively. Also, the new four-sampled zero crossing asymmetry compensation algorithm provides 34% improvement of jitter performance, 50% reduction of compensation time and 2.0dB gain of SNR compared with other algorithms. Finally, the proposed systems combined with asymmetry compensator and DPLL are shown to provide improved performance of about 0.4dB, 2dB over the existing schemes by BER evaluation.

  • PDF

Radix-4 Trellis Parallel Architecture and Trace Back Viterbi Decoder with Backward State Transition Control (Radix-4 트렐리스 병렬구조 및 역방향 상태천이의 제어에 의한 역추적 비터비 디코더)

  • 정차근
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.5
    • /
    • pp.397-409
    • /
    • 2003
  • This paper describes an implementation of radix-4 trellis parallel architecture and backward state transition control trace back Viterbi decoder, and presents the application results to high speed wireless LAN. The radix-4 parallelized architecture Vietrbi decoder can not only improve the throughput with simple structure, but also have small processing delay time and overhead circuit compared to M-step trellis architecture one. Based on these features, this paper addresses a novel Viterbi decoder which is composed of branch metric computation, architecture of ACS and trace back decoding by sequential control of backward state transition for the implementation of radix-4 trellis parallelized structure. With the proposed architecture, the decoding of variable code rate due to puncturing the base code can easily be implemented by the unified Viterbi decoder. Moreover, any additional circuit and/or peripheral control logic are not required in the proposed decoder architecture. The trace back decoding scheme with backward state transition control can carry out the sequential decoding according to ACS cycle clock without additional circuit for survivor memory control. In order to evaluate the usefulness, the proposed method is applied to channel CODEC of the IEEE 802.11a high speed wireless LAN, and HDL coding simulation results are presented.

Hardware Implementation of Real-Time Blind Watermarking by Substituting Bitplanes of Wavelet DC Coefficients (웨이블릿 DC 계수의 비트평면 치환방법에 의한 실시간 블라인드 워터마킹 및 하드웨어 구현)

  • 서영호;김동욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.3C
    • /
    • pp.398-407
    • /
    • 2004
  • In this paper, a blind watermarking method which is suitable to the video compression using 2-D discrete wavelet transform was proposed and implemented into the hardware using VHDL(VHSIC Hardware Description Language). The goal of the proposed watermarking algorithm is the authentication about the manipulation of the watermark embedded image and the detection of the error positions. Considering the compressed video image, the proposed watermarking scheme is unrelated to the quantization and is able to concurrently embed or extract the watermark. We experimentally verified that the lowest frequency subband(LL4) is not sensitive to the change in the spatial domain, so LL4 subband was selected for the mark space. And the combination of the bitplanes which has the properties of both the minimum degradation of the image and the robustness was chosen as the embedded Point in the mark space in LL4 subband. Since we know the watermark embedded positions and the watermark is embedded by not varying the value but changing the value, the watermark can be extracted without the original image. Also, for the security when exposing the watermark embedded position, we embed the encrypted watermark by the block cipher. The proposed watermark algorithm shows the robustness against the general image manipulation and is easily transplanted into the image or video compressor with the minimal changing in the structure. The designed hardware has 4037 LABs(24%) and 85 ESBs(3%) in APEX20KC EP20K400CF672C7 FPGA of Altera and stably operates in 82MHz clock frequency.

Low-Power $32bit\times32bit$ Multiplier Design for Deep Submicron Technologies beyond 130nm (130nm 이하의 초미세 공정을 위한 저전력 32비트$\times$32비트 곱셈기 설계)

  • Jang Yong-Ju;Lee Seong-Soo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.6 s.348
    • /
    • pp.47-52
    • /
    • 2006
  • This paper proposes a novel low-power $32bit\times32bit$ multiplier for deep submicron technologies beyond 130nm. As technology becomes small, static power due to leakage current significantly increases, and it becomes comparable to dynamic power. Recently, shutdown method based on MTCMOS is widely used to reduce both dynamic and static power. However, it suffers from severe power line noise when restoring whole large-size functional block. Therefore, the proposed multiplier mitigates this noise by shutting down and waking up sequentially along with pipeline stage. Fabricated chip measurement results in $0.35{\mu}m$ technology and gate-transition-level simulation results in 130nm and 90nm technologies show that it consumes $66{\mu}W,\;13{\mu}W,\;and\;6{\mu}W$ in idle mode, respectively, and it reduces power consumption to $0.04%\sim0.08%$ of active mode. As technology becomes small, power reduction efficiency degrades in the conventional clock gating scheme, but the proposed multiplier does not.