• Title/Summary/Keyword: 곱셈적 구조

Search Result 229, Processing Time 0.026 seconds

Design of a high-speed 4-2 compressor for fast multiplication (고속 곱셈연산을 위한 고속 4-2 compressor 설계)

  • Lee, Sung-Tae;Kim, Jeong-Beom
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.401-402
    • /
    • 2009
  • 4-2 compressor는 곱셈기의 부분 곱 합 트리(partial product summation tree)의 기본적인 구성요소이다. 본 논문은 고속 연산이 가능한 4-2 compressor 구조를 제안한다. 제안한 회로는 최적화된 XORXNOR와 MUX로 구성하였다. 이 회로는 기존의 회로와 비교하였을 때 회로 구성에 필요한 트랜지스터수가 12개 감소하였으며, 지연시간이 32.2% 감소하였다. 제안한 회로는 Samsung 0.18um CMOS 공정을 이용하여 HSPICE로 시뮬레이션 하였다.

Design of a low-power 4-2 compressor for fast multiplication (고속 곱셈연산을 위한 저 전력 4-2 compressor 설계)

  • Lee, Sung-Tae;Kim, Jeong-Beom
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.405-406
    • /
    • 2009
  • 4-2 compressor는 곱셈기의 부분 곱 합 트리(partial product summation tree)의 기본적인 구성요소이다. 본 논문은 저 전력 특성을 갖는 4-2 compressor 구조를 제안한다. 제안한 회로는 한 개의 전가산기와 MUX로 구성하였다. 이 회로는 기존의 회로와 비교하였을 때 회로 구성에 필요한 트랜지스터수가 14개 감소하였으며, 6.3%의 전력소모가 감소하였다. 제안한 회로는 Samsung 0.18um CMOS 공정을 이용하여 HSPICE로 시뮬레이션 하였다.

Design of ECC Scalar Multiplier based on a new Finite Field Division Algorithm (새로운 유한체 나눗셈기를 이용한 타원곡선암호(ECC) 스칼라 곱셈기의 설계)

  • 김의석;정용진
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.5C
    • /
    • pp.726-736
    • /
    • 2004
  • In this paper, we proposed a new scalar multiplier structure needed for an elliptic curve cryptosystem(ECC) over the standard basis in GF(2$^{163}$ ). It consists of a bit-serial multiplier and a divider with control logics, and the divider consumes most of the processing time. To speed up the division processing, we developed a new division algorithm based on the extended Euclid algorithm. Dynamic data dependency of the Euclid algorithm has been transformed to static and fixed data flow by a localization technique, to make it independent of the input and field polynomial. Compared to other existing scalar multipliers, the new scalar multiplier requires smaller gate counts with improved processor performance. It has been synthesized using Samsung 0.18 um CMOS technology, and the maximum operating frequency is estimated 250 MHz. The resulting performance is 148 kbps, that is, it takes 1.1 msec to process a 163-bit data frame. We assure that this performance is enough to be used for digital signature, encryption/decryption, and key exchanges in real time environments.

An Efficient Multiplexer-based AB2 Multiplier Using Redundant Basis over Finite Fields

  • Kim, Keewon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.1
    • /
    • pp.13-19
    • /
    • 2020
  • In this paper, we propose a multiplexer based scheme that performs modular AB2 multiplication using redundant basis over finite field. Then we propose an efficient multiplexer based semi-systolic AB2 multiplier using proposed scheme. We derive a method that allows the multiplexers to perform the operations in the cell of the modular AB2 multiplier. The cell of the multiplier is implemented using multiplexers to reduce cell latency. As compared to the existing related structures, the proposed AB2 multiplier saves about 80.9%, 61.8%, 61.8%, and 9.5% AT complexity of the multipliers of Liu et al., Lee et al., Ting et al., and Kim-Kim, respectively. Therefore, the proposed multiplier is well suited for VLSI implementation and can be easily applied to various applications.

Design of Multipliers Optimized for CNN Inference Accelerators (CNN 추론 연산 가속기를 위한 곱셈기 최적화 설계)

  • Lee, Jae-Woo;Lee, Jaesung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1403-1408
    • /
    • 2021
  • Recently, FPGA-based AI processors are being studied actively. Deep convolutional neural networks (CNN) are basic computational structures performed by AI processors and require a very large amount of multiplication. Considering that the multiplication coefficients used in CNN inference operation are all constants and that an FPGA is easy to design a multiplier tailored to a specific coefficient, this paper proposes a methodology to optimize the multiplier. The method utilizes 2's complement and distributive law to minimize the number of bits with a value of 1 in a multiplication coefficient, and thereby reduces the number of required stacked adders. As a result of applying this method to the actual example of implementing CNN in FPGA, the logic usage is reduced by up to 30.2% and the propagation delay is also reduced by up to 22%. Even when implemented with an ASIC chip, the hardware area is reduced by up to 35% and the delay is reduced by up to 19.2%.

An ASIC implementation of Synchronized Phase Measurement Unit based on Sliding-DFT (순환 DFT에 기초한 광역 동기 위상 측정 장치의 ASIC 구현)

  • Kim, Chong-Yun;Kim, Suk-Hoon;Chang, Tae-Gyu
    • Proceedings of the KIEE Conference
    • /
    • 2001.07a
    • /
    • pp.302-304
    • /
    • 2001
  • 본 논문에서는 다 채널 위상 측정 장치를 전용하드웨어로 구현하기 위한 설계 구조에 대하여 제시하였으며, 연산량이 많은 곱셈기를 시분할에 의해 공유하는 구조를 제시하였다. 또한 페이저 측정을 위한 Sliding-DFT 알고리즘을 순환 구현할 경우의 근사 구현 오차에 관한 정량적인 연구를 수행하였다. 이러한 오차 영향의 해석을 기반으로 하여 곱셈기 공유 구조를 적용한 위상 측정 장치를 설계하고, 설계한 하드웨어의 내부동작을 보여주는 시뮬레이션을 통해 설계의 정확성을 확인하였다.

  • PDF

Design of a Low Power Reconfigurable DSP with Fine-Grained Clock Gating (정교한 클럭 게이팅을 이용한 저전력 재구성 가능한 DSP 설계)

  • Jung, Chan-Min;Lee, Young-Geun;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.82-92
    • /
    • 2008
  • Recently, many digital signal processing(DSP) applications such as H.264, CDMA and MP3 are predominant tasks for modern high-performance portable devices. These applications are generally computation-intensive, and therefore, require quite complicated accelerator units to improve performance. Designing such specialized, yet fixed DSP accelerators takes lots of effort. Therefore, DSPs with multiple accelerators often have a very poor time-to-market and an unacceptable area overhead. To avoid such long time-to-market and high-area overhead, dynamically reconfigurable DSP architectures have attracted a lot of attention lately. Dynamically reconfigurable DSPs typically employ a multi-functional DSP accelerator which executes similar, yet different multiple kinds of computations for DSP applications. With this type of dynamically reconfigurable DSP accelerators, the time to market reduces significantly. However, integrating multiple functionalities into a single IP often results in excessive control and area overhead. Therefore, delay and power consumption often turn out to be quite excessive. In this thesis, to reduce power consumption of dynamically reconfigurable IPs, we propose a novel fine-grained clock gating scheme, and to reduce size of dynamically reconfigurable IPs, we propose a compact multiplier-less multiplication unit where shifters and adders carry out constant multiplications.

An Efficient Test Method for a Full-Custom Design of a High-Speed Binary Multiplier (풀커스텀 (full-custom) 고속 곱셈기 회로의 효율적인 테스트 방안)

  • Moon, San-Gook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.830-833
    • /
    • 2007
  • In this paper, we implemented a $17{\times}17b$ binary digital multiplier using radix-4 Booth;s algorithmand proposed an efficient testing methodology for the full-custom design. A two-stage pipeline architecture was applied to achieve higher throughput and 4:2 adders were used for regular layout structure in the Wallace tree partition. Several chips were fabricated using LG Semicon 0.6-um 3-Metal N-well CMOS technology. We did fault simulations efficiently using the proposed test method resulting in the reduction of the number of faulty nodes by 88%. The chip contains 9115 transistors and the core area occupies $1135^*1545$ mm2. The functional tests using ATS-2 tester showed that it can operate with 24 MHz clock at 5.0 V at room temperature.

  • PDF

Development of Integer DCT for VLSI Implementation (VLSI 구현을 위한 정수화 DCT 개발)

  • 곽훈성;이종하
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.12
    • /
    • pp.1928-1934
    • /
    • 1993
  • This paper presents a fast algorithm of integer discrete cosine transform(IDCT) allowing VLSI implementation by integer arithmetic. The proposed fast algorithm has been developed using Chen`s matrix decomposition in DCT, and requires less number of arithmetic operations compared to the IDCT. In the presented algorithm, the number of addition number is the same as the one of Chen`s algorithm if DCT, and the number of multiplication if the same as that in DCT at N=8 but drastically decreasing when N is above 8. In addition, the drawbacks of DCT such as performance degradation at the finite length arithmetic could be overcome by the IDCT.

  • PDF

Low-Area Symbol Timing Offset Synchronization Structure for WLAN Modem (WLAN용 저면적 심볼 타이밍 옵셋 동기화기 구조)

  • Ha, Jun-Hyung;Jang, Young-Beom
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.3
    • /
    • pp.1387-1394
    • /
    • 2011
  • In this paper, a low-area symbol timing offset synchronization structure for WLAN Modem is proposed. Using CSD(Canonic Signed Digit) coefficients and CSS(Common Sub-expression Sharing) technique for the filter implementation, efficient structure for multiplication block can be obtained. Function simulation for proposed structure is done by using the preamble with timing offset. Through Verilog-HDL coding and synthesis, it is shown that the proposed symbol timing offset synchronization structure can be implemented with low-area semiconductor.