• Title/Summary/Keyword: clock cycles

Search Result 148, Processing Time 0.023 seconds

(Design of New Architecture for Simultaneously Computing Multiplication and Squaring over $GF(2^m)$ based on Cellular Automata) ($GF(2^m)$상에서 셀룰러 오토마타를 이용한 곱셈/제곱 동시 연산기 설계)

  • Gu, Gyo-Min;Ha, Gyeong-Ju;Kim, Hyeon-Seong;Yu, Gi-Yeong
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.39 no.3
    • /
    • pp.211-219
    • /
    • 2002
  • In this paper, a new architecture that can simultaneously process modular multiplication and squaring on GF(2$^{m}$ ) in m clock cycles by using the cellular automata is presented. This can be used efficiently for the design of the modular exponentiation on the finite field which is the basic computation in most public key crypto systems such as Diffie-Hellman key exchange, EIGamal, etc. Also, the cellular automata architecture is simple, regular, modular, cascadable and therefore, can be utilized efficiently for the implementation of VLSI.

Low Complexity Digit-Parallel/Bit-Serial Polynomial Basis Multiplier (저복잡도 디지트병렬/비트직렬 다항식기저 곱셈기)

  • Cho, Yong-Suk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.4C
    • /
    • pp.337-342
    • /
    • 2010
  • In this paper, a new architecture for digit-parallel/bit-serial GF($2^m$) multiplier with low complexity is proposed. The proposed multiplier operates in polynomial basis of GF($2^m$) and produces multiplication results at a rate of one per D clock cycles, where D is the selected digit size. The digit-parallel/bit-serial multiplier is faster than bit-serial ones but with lower area complexity than bit-parallel ones. The most significant feature of the digit-parallel/bit-serial architecture is that a trade-off between hardware complexity and delay time can be achieved. But the traditional digit-parallel/bit-serial multiplier needs extra hardware for high speed. In this paper a new low complexity efficient digit-parallel/bit-serial multiplier is presented.

Design of DSP Instructions and their Hardware Architecture for Reed-Solomon Codecs (Reed-Solomon 부호화/복호화를 위한 DSP 명령어 및 하드웨어 설계)

  • 이재성;선우명훈
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.6A
    • /
    • pp.405-413
    • /
    • 2003
  • This paper presents new DSP (Digital Signal Processor) instructions and their hardware architecture to efficiently implement RS (Reed-Solomon) codecs, which is one of the most widely used FEC (Forward Error Control) algorithms. The proposed DSP architecture can implement various primitive polynomials by program, and thus, hardwired codecs can be replaced. The new instructions and their hardware architecture perform GF (Galois Field) operations using the proposed GF multiplier and adder. Therefore, the proposed DSP architecture can significantly reduce the number of clock cycles compared with existing DSP chips. It can perform RS decoding rate of up to 228.1 Mbps on 130MHz DSP chips.

A New Systolic Array for LSD-first Multiplication in $CF(2^m)$ ($CF(2^m)$상의 LSD 우선 곱셈을 위한 새로운 시스톨릭 어레이)

  • Kim, Chang-Hoon;Nam, In-Gil
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.4C
    • /
    • pp.342-349
    • /
    • 2008
  • This paper presents a new digit-serial systolic multiplier over $CF(2^m)$ for cryptographic applications. When input data come in continuously, the proposed array produces multiplication results at a rate of one every ${\lceil}m/D{\rceil}$ clock cycles, where D is the selected digit size. Since the inner structure of the proposed array is tree-type, critical path increases logarithmically proportional to D. Therefore, the computation delay of the proposed architecture is significantly less than previously proposed digit-serial systolic multipliers whose critical path increases proportional to D. Furthermore, since the new architecture has the features of regularity, modularity, and unidirectional data flow, it is well suited to VLSI implementations.

PD-DESYNC: Practical and Deterministic Desynchronization in Wireless Sensor Networks

  • Hyun, Sang-Hyun;Kim, Geon;Yang, Dongmin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.8
    • /
    • pp.3880-3899
    • /
    • 2019
  • Distributive desynchronization algorithms based on pulse-coupled oscillator (PCO) models have been proposed for achieving collision-free wireless transmissions. These algorithms do not depend on a global clock or infrastructure overheads. Moreover, they gradually converge to fair time-division multiple access (TDMA) scheduling by broadcasting a periodic pulse signal (called a 'firing') and adjusting the next firing time based on firings from other nodes. The time required to achieve constant spacing between phase neighbors is estimated in a closed form or via stochastic modeling. However, because these algorithms cannot guarantee the completion of desynchronization in a short and bounded timeframe, they are not practical. Motivated by the limitations of these methods, we propose a practical solution called PD-DESYNC that provides a short and deterministic convergence time using a flag firing to indicate the beginning of a cycle. We demonstrate that the proposed method guarantees the completion of desynchronization within three cycles, regardless of the number of nodes. Through extensive simulations and experiments, we confirm that PD-DESYNC not only outperforms other algorithms in terms of convergence time but also is a practical solution.

An Efficient Hardware Implementation of 257-bit Point Scalar Multiplication for Binary Edwards Curves Cryptography (이진 에드워즈 곡선 공개키 암호를 위한 257-비트 점 스칼라 곱셈의 효율적인 하드웨어 구현)

  • Kim, Min-Ju;Jeong, Young-su;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.246-248
    • /
    • 2022
  • Binary Edwards curves (BEdC), a new form of elliptic curves proposed by Bernstein, satisfy the complete addition law without exceptions. This paper describes an efficient hardware implementation of point scalar multiplication on BEdC using projective coordinates. Modified Montgomery ladder algorithm was adopted for point scalar multiplication, and binary field arithmetic operations were implemented using 257-bit binary adder, 257-bit binary squarer, and 32-bit binary multiplier. The hardware operation of the BEdC crypto-core was verified using Zynq UltraScale+ MPSoC device. It takes 521,535 clock cycles to compute point scalar multiplication.

  • PDF

Parallel Inverse Transform and Small-sized Inverse Quantization Architectures Design of H.264/AVC Decoder (H.264/AVC 복호기의 병렬 역변환 구조 및 저면적 역양자화 구조 설계)

  • Jung, Hong-Kyun;Cha, Ki-Jong;Park, Seung-Yong;Kim, Jin-Young;Ryoo, Kwang-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.10a
    • /
    • pp.444-447
    • /
    • 2011
  • In this paper, parallel IT(inverse transform) architecture and IQ(inverse quantization) architecture with common operation unit for the H.264/AVC decoder are proposed. By using common operation unit, the area cost and computational complexity of IQ are reduced. In order to take four execution cycles to perform IT, the proposed IT architecture has parallel architecture with one horizontal DCT unit and four vertical DCT units. Furthermore, the execution cycles of the proposed architecture is reduced to five cycles by applying two state pipeline architecture. The proposed architecture is implemented to a single chip by using Magnachip 0.18um CMOS technology. The gate count of the proposed architecture is 14.3k at clock frequency of 13MHz and the area of proposed IQ is reduced 39.6% compared with the previous one. The experimental result shows that execution cycle the proposed architecture is about 49.09% higher than that of the previous one.

  • PDF

Low-Complexity Deeply Embedded CPU and SoC Implementation (낮은 복잡도의 Deeply Embedded 중앙처리장치 및 시스템온칩 구현)

  • Park, Chester Sungchung;Park, Sungkyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.3
    • /
    • pp.699-707
    • /
    • 2016
  • This paper proposes a low-complexity central processing unit (CPU) that is suitable for deeply embedded systems, including Internet of things (IoT) applications. The core features a 16-bit instruction set architecture (ISA) that leads to high code density, as well as a multicycle architecture with a counter-based control unit and adder sharing that lead to a small hardware area. A co-processor, instruction cache, AMBA bus, internal SRAM, external memory, on-chip debugger (OCD), and peripheral I/Os are placed around the core to make a system-on-a-chip (SoC) platform. This platform is based on a modified Harvard architecture to facilitate memory access by reducing the number of access clock cycles. The SoC platform and CPU were simulated and verified at the C and the assembly levels, and FPGA prototyping with integrated logic analysis was carried out. The CPU was synthesized at the ASIC front-end gate netlist level using a $0.18{\mu}m$ digital CMOS technology with 1.8V supply, resulting in a gate count of merely 7700 at a 50MHz clock speed. The SoC platform was embedded in an FPGA on a miniature board and applied to deeply embedded IoT applications.

Optimized DSP Implementation of Audio Decoders for Digital Multimedia Broadcasting (디지털 방송용 오디오 디코더의 DSP 최적화 구현)

  • Park, Nam-In;Cho, Choong-Sang;Kim, Hong-Kook
    • Journal of Broadcast Engineering
    • /
    • v.13 no.4
    • /
    • pp.452-462
    • /
    • 2008
  • In this paper, we address issues associated with the real-time implementation of the MPEG-1/2 Layer-II (or MUSICAM) and MPEG-4 ER-BSAC decoders for Digital Multimedia Broadcasting (DMB) on TMS320C64x+ that is a fixed-point DSP processor with a clock speed of 330 MHz. To achieve the real-time requirement, they should be optimized in different steps as follows. First of all, a C-code level optimization is performed by sharing the memory, adjusting data types, and unrolling loops. Next, an algorithm level optimization is carried out such as the reconfiguration of bitstream reading, the modification of synthesis filtering, and the rearrangement of the window coefficients for synthesis filtering. In addition, the C-code of a synthesis filtering module of the MPEG-1/2 Layer-II decoder is rewritten by using the linear assembly programming technique. This is because the synthesis filtering module requires the most processing time among all processing modules of the decoder. In order to show how the real-time implementation works, we obtain the percentage of the processing time for decoding and calculate a RMS value between the decoded audio signals by the reference MPEG decoder and its DSP version implemented in this paper. As a result, it is shown that the percentages of the processing time for the MPEG-1/2 Layer-II and MPEG-4 ER-BSAC decoders occupy less than 3% and 11% of the DSP clock cycles, respectively, and the RMS values of the MPEG-1/2 Layer-II and MPEG-4 ER-BSAC decoders implemented in this paper all satisfy the criterion of -77.01 dB which is defined by the MPEG standards.

Scalable RSA public-key cryptography processor based on CIOS Montgomery modular multiplication Algorithm (CIOS 몽고메리 모듈러 곱셈 알고리즘 기반 Scalable RSA 공개키 암호 프로세서)

  • Cho, Wook-Lae;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.1
    • /
    • pp.100-108
    • /
    • 2018
  • This paper describes a design of scalable RSA public-key cryptography processor supporting four key lengths of 512/1,024/2,048/3,072 bits. The modular multiplier that is a core arithmetic block for RSA crypto-system was designed with 32-bit datapath, which is based on the CIOS (Coarsely Integrated Operand Scanning) Montgomery modular multiplication algorithm. The modular exponentiation was implemented by using L-R binary exponentiation algorithm. The scalable RSA crypto-processor was verified by FPGA implementation using Virtex-5 device, and it takes 456,051/3,496347/26,011,947/88,112,770 clock cycles for RSA computation for the key lengths of 512/1,024/2,048/3,072 bits. The RSA crypto-processor synthesized with a $0.18{\mu}m$ CMOS cell library occupies 10,672 gate equivalent (GE) and a memory bank of $6{\times}3,072$ bits. The estimated maximum clock frequency is 147 MHz, and the RSA decryption takes 3.1/23.8/177/599.4 msec for key lengths of 512/1,024/2,048/3,072 bits.