• Title/Summary/Keyword: Systolic array

Search Result 144, Processing Time 0.026 seconds

Design of Modular Exponentiation Processor for RSA Cryptography (RSA 암호시스템을 위한 모듈러 지수 연산 프로세서 설계)

  • 허영준;박혜경;이건직;이원호;유기영
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.4
    • /
    • pp.3-11
    • /
    • 2000
  • In this paper, we design modular multiplication systolic array and exponentiation processor having n bits message black. This processor uses Montgomery algorithm and LR binary square and multiply algorithm. This processor consists of 3 divisions, which are control unit that controls computation sequence, 5 shift registers that save input and output values, and modular exponentiation unit. To verify the designed exponetion processor, we model and simulate it using VHDL and MAX+PLUS II. Consider a message block length of n=512, the time needed for encrypting or decrypting such a block is 59.5ms. This modular exponentiation unit is used to RSA cryptosystem.

Multithreaded and Overlapped Systolic Array for Depthwise Separable Convolution (깊이별 분리 합성곱을 위한 다중 스레드 오버랩 시스톨릭 어레이)

  • Jongho Yoon;Seunggyu Lee;Seokhyeong Kang
    • Transactions on Semiconductor Engineering
    • /
    • v.2 no.1
    • /
    • pp.1-8
    • /
    • 2024
  • When processing depthwise separable convolution, low utilization of processing elements (PEs) is one of the challenges of systolic array (SA). In this study, we propose a new SA architecture to maximize throughput in depthwise convolution. Moreover, the proposed SA performs subsequent pointwise convolution on the idle PEs during depthwise convolution computation to increase the utilization. After the computation, we utilize unused PEs to boost the remaining pointwise convolution. Consequently, the proposed 128x128 SA achieves a 4.05x and 1.75x speed improvement and reduces the energy consumption by 66.7 % and 25.4 %, respectively, compared to the basic SA and RiSA in MobileNetV3.

Design of an Efficient VLSI Architecture of SADCT Based on Systolic Array (시스톨릭 어레이에 기반한 SADCT의 효율적 VLSl 구조설계)

  • Gang, Tae-Jun;Jeong, Ui-Yun;Gwon, Sun-Gyu;Ha, Yeong-Ho
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.3
    • /
    • pp.282-291
    • /
    • 2001
  • In this paper, an efficient VLSI architecture of Shape Adaptive Discrete Cosine Transform(SADCT) based on systolic array is proposed. Since transform size in SADCT is varied according to the shape of object in each block, it are dropped that both usability of processing elements(PE´s) and throughput rate in time-recursive SADCT structure. To overcome these disadvantages, it is proposed that the architecture based on a systolic way structure which doesn´t need memory. In the proposed architecture, throughput rate is improved by consecutive processing of one-dimensional SADCT without memory and PE´s in the first column are connected to that in the last one for improvement of usability of PE. And input data are put into each column of PE in parallel according to the maximum data number in each rearranged block. The proposed architecture is described by VHDL. Also, its function is evaluated by MentorTM. Even though the hardware complexity is somewhat increased, the throughput rate is improved about twofold.

  • PDF

Optimized and Portable FPGA-Based Systolic Cell Architecture for Smith-Waterman-Based DNA Sequence Alignment

  • Shah, Hurmat Ali;Hasan, Laiq;Koo, Insoo
    • Journal of information and communication convergence engineering
    • /
    • v.14 no.1
    • /
    • pp.26-34
    • /
    • 2016
  • The alignment of DNA sequences is one of the important processes in the field of bioinformatics. The Smith-Waterman algorithm (SWA) performs optimally for aligning sequences but is computationally expensive. Field programmable gate array (FPGA) performs the best on parameters such as cost, speed-up, and ease of re-configurability to implement SWA. The performance of FPGA-based SWA is dependent on efficient cell-basic implementation-unit design. In this paper, we present an optimized systolic cell design while avoiding oversimplification, very large-scale integration (VLSI)-level design, and direct mapping of iterative equations such as previous cell designs. The proposed design makes efficient use of hardware resources and provides portability as the proposed design is not based on gate-level details. Our cell design implementing a linear gap penalty resulted in a performance improvement of 32× over a GPP platform and surpassed the hardware utilization of another implementation by a factor of 4.23.

An Efficient Bit-serial Systolic Multiplier over GF($2^m$) (GF($2^m$)상의 효율적인 비트-시리얼 시스톨릭 곱셈기)

  • Lee Won-Ho;Yoo Kee-Young
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.1_2
    • /
    • pp.62-68
    • /
    • 2006
  • The important arithmetic operations over finite fields include multiplication and exponentiation. An exponentiation operation can be implemented using a series of squaring and multiplication operations over GF($2^m$) using the binary method. Hence, it is important to develop a fast algorithm and efficient hardware for multiplication. This paper presents an efficient bit-serial systolic array for MSB-first multiplication in GF($2^m$) based on the polynomial representation. As compared to the related multipliers, the proposed systolic multiplier gains advantages in terms of input-pin and area-time complexity. Furthermore, it has regularity, modularity, and unidirectional data flow, and thus is well suited to VLSI implementation.

Resource and Delay Efficient Polynomial Multiplier over Finite Fields GF (2m) (유한체상의 자원과 시간에 효율적인 다항식 곱셈기)

  • Lee, Keonjik
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.16 no.2
    • /
    • pp.1-9
    • /
    • 2020
  • Many cryptographic and error control coding algorithms rely on finite field GF(2m) arithmetic. Hardware implementation of these algorithms needs an efficient realization of finite field arithmetic operations. Finite field multiplication is complicated among the basic operations, and it is employed in field exponentiation and division operations. Various algorithms and architectures are proposed in the literature for hardware implementation of finite field multiplication to achieve a reduction in area and delay. In this paper, a low area and delay efficient semi-systolic multiplier over finite fields GF(2m) using the modified Montgomery modular multiplication (MMM) is presented. The least significant bit (LSB)-first multiplication and two-level parallel computing scheme are considered to improve the cell delay, latency, and area-time (AT) complexity. The proposed method has the features of regularity, modularity, and unidirectional data flow and offers a considerable improvement in AT complexity compared with related multipliers. The proposed multiplier can be used as a kernel circuit for exponentiation/division and multiplication.

A VLSI architecture for fast motion estimation algorithm (고속 움직임 추정 알고리즘에 적합한 VLSI 구조 연구)

  • 이재헌;라종범
    • Proceedings of the IEEK Conference
    • /
    • 1998.06a
    • /
    • pp.717-720
    • /
    • 1998
  • In this paper, we propose a VLSI architecture for implementing a crecently proposed fast block matching algorithm, which is called the HSBMA3S. The proposed architecture consists of a systolic array based basic unit and two shift register arrays. And it covers a search range of -32 ~+31. By using a basic unit repeatedly, we can redcue the number of gates. To implement the basic unit, we can select one among various conventional systolic arrays by trading-off between speed and hardware cost. In this paper, the architecture for the basic unit is selected so that the hardware cost can be minimized. The proposed architecture is fast enough for low bit-rate applications (frame size of 352x288, 30 frames/sec) and can be implemented by less than 20,000 gates. Moreover, by simply modifying the basic unit, the architecture can be used for the higher bit-rate application of the frame size of 720*480 and 30 frames/sec.

  • PDF

A Fast and Scalable Priority Queue Hardware Architecture for Packet Schedulers (패킷 스케줄러를 위한 빠르고 확장성 있는 우선순위 큐의 하드웨어 구조)

  • Kim, Sang-Gyun;Moon, Byung-In
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.10
    • /
    • pp.55-60
    • /
    • 2007
  • This paper proposes a fast and scalable priority queue architecture for use in high-speed networks which supports quality of service (QoS) guarantees. This architecture is cost-effective since a single queue can generate outputs to multiple out-links. Also, compared with the previous multiple systolic array priority queues, the proposed queue provides fast output generation which is important to high-speed packet schedulers, using a special control block. In addition this architecture provides the feature of high scalability.

(Design of Systolic Away for High-Speed Fractal Image Compression by Data Reusing) (데이터 재사용에 의한 고속 프랙탈 영상압축을 위한 시스토릭 어레이의 설계)

  • U, Jong-Ho;Lee, Hui-Jin;Lee, Su-Jin;Seong, Gil-Yeong
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.39 no.3
    • /
    • pp.220-227
    • /
    • 2002
  • An one-dimensional VLSI array for high speed processing of Fractal image compression was designed. Using again the overlapped input data of adjacent domain blocks in the existing one-dimensional VLSI array, we can save the number of total input for the operations, and so we can save the total computation time. In the design procedure, we considered the data dependences between the input data, reordered the input data to the array, and designed the processing elements. Registers and multiplexors are added for the storing and routing of the input data in some processing elements. Consequently as adding a little hardware, this design shows (N-4B)/4(N-B) times of speed-up compared with the existing array, where N is image size and B is block size.

Efficient Semi-systolic AB2 Multiplier over Finite Fields

  • Kim, Keewon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.1
    • /
    • pp.37-43
    • /
    • 2020
  • In this paper, we propose an efficient AB2 multiplication algorithm using SPB(shifted polynomial basis) over finite fields. Using the feature of the SPB, we split the equation for AB2 multiplication into two parts. The two partitioned equations are executable at the same time, and we derive an algorithm that processes them in parallel. Then we propose an efficient semi-systolic AB2 multiplier based on the proposed algorithm. The proposed multiplier has less area-time (AT) complexity than related multipliers. In detail, the proposed AB2 multiplier saves about 94%, 87%, 86% and 83% of the AT complexity of the multipliers of Wei, Wang-Guo, Kim-Lee, Choi-Lee, respectively. Therefore, the proposed multiplier is suitable for VLSI implementation and can be easily adopted as the basic building block for various applications.