• Title/Summary/Keyword: Bit-parallel multiplier

Search Result 65, Processing Time 0.017 seconds

A linear array SliM-II image processor chip (선형 어레이 SliM-II 이미지 프로세서 칩)

  • 장현만;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.2
    • /
    • pp.29-35
    • /
    • 1998
  • This paper describes architectures and design of a SIMD type parallel image processing chip called SliM-II. The chiphas a linear array of 64 processing elements (PEs), operates at 30 MHz in the worst case simulation and gives at least 1.92 GIPS. In contrast to existing array processors, such as IMAP, MGAP-2, VIP, etc., each PE has a multiplier that is quite effective for convolution, template matching, etc. The instruction set can execute an ALU operation, data I/O, and inter-PE communication simulataneously in a single instruction cycle. In addition, during the ALU/multiplier operation, SliM-II provides parallel move between the register file and on-chip memory as in DSP chips, SliM-II can greatly reduce the inter-PE communication overhead, due to the idea a sliding, which is a technique of overlapping inter-PE communication with computation. Moreover, the bandwidth of data I/O and inter-PE communication increases due to bit-parallel data paths. We used the COMPASS$^{TM}$ 3.3 V 0.6.$\mu$m standrd cell library (v8r4.10). The total number of transistors is about 1.5 muillions, the core size is 13.2 * 13.0 mm$^{2}$ and the package type is 208 pin PQ2 (Power Quad 2). The performance evaluation shows that, compared to a existing array processors, a proposed architeture gives a significant improvement for algorithms requiring multiplications.s.

  • PDF

Design of a Low-Power Multiplier Using MOS Current Mode Logic Circuit (MOS 전류모드 논리회로를 이용한 저 전력 곱셈기 설계)

  • Lee, Yoon-Sang;Kim, Jeong-Beom
    • Journal of IKEEE
    • /
    • v.11 no.2
    • /
    • pp.83-88
    • /
    • 2007
  • This paper proposes an 8${\times}$8 bit parallel multiplier using MOS current-mode logic (MCML) circuit for low power consumption. The 8${\times}$8 multiplier is designed with proposed MCML full adders and conventional full adders. The designed multiplier is achieved to reduce the power consumption by 9.4% and the power-delay-product by 11.7% compared with the conventional circuit. This circuit is designed with Samsung 0.35${\mu}m$ standard CMOS process. The validity and effectiveness are verified through the HSPICE simulation.

  • PDF

A Efficient Architecture of MBA-based Parallel MAC for High-Speed Digital Signal Processing (고속 디지털 신호처리를 위한 MBA기반 병렬 MAC의 효율적인 구조)

  • 서영호;김동욱
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.7
    • /
    • pp.53-61
    • /
    • 2004
  • In this paper, we proposed a new architecture of MAC(Multiplier-Accumulator) to operate high-speed multiplication-accumulation. We used the MBA(Modified radix-4 Booth Algorithm) which is based on the 1's complement number system, and CSA(Carry Save Adder) for addition of the partial products. During the addition of the partial product, the signed numbers with the 1's complement type after Booth encoding are converted in the 2's complement signed number in the CSA tree. Since 2-bit CLA(Carry Look-ahead Adder) was used in adding the lower bits of the partial product, the input bit width of the final adder and whole delay of the critical path were reduced. The proposed MAC was applied into the DWT(Discrete Wavelet Transform) filtering operation for JPEG2000, and it showed the possibility for the practical application. Finally we identified the improved performance according to the comparison with the previous architecture in the aspect of hardware resource and delay.

An Optimized Hybrid Radix MAC Design (최적화된 4진18진 혼합 MAC 설계)

  • 정진우;김승철;이용주;이용석
    • Proceedings of the IEEK Conference
    • /
    • 2002.06b
    • /
    • pp.173-176
    • /
    • 2002
  • This paper is about a high-speed MAC (multiplier and accumulator) design applying radix-4 and radix-8 Booth's algorithm at the same time. The optimized hybrid radix design for high speed MAC has taken advantage of both a radix-4 and a radix-8 architectures. A radix-4 architecture meets high-speed, but it takes much more power and chip area than a radix-8 architecture. A radix-8 architecture needs less power and chip area than the other, but it has a bottleneck of generating three times the multiplicand problem. An optimized hybrid architecture performs the radix-4 multiplication partially in parallel with the generation of three times the multiplicand for use of the radix-8 multiplication. It reduces the concerned bit width of multiplier in radix-8 multiplication.

  • PDF

An Optimized Hybrid Radix MAC Design (최적화된 4진/8진 혼합 MAC 설계)

  • 정진우;김승철;이용주;이용석
    • Proceedings of the IEEK Conference
    • /
    • 2002.06a
    • /
    • pp.125-128
    • /
    • 2002
  • This paper is about a high-speed MAC (multiplier and accumulator) design applying radix-4 and radix-8 Booth's algorithm at the same time. The optimized hybrid radix design for high speed MAC has taken advantage of both a radix-4 and a radix-8 architectures. A radix-4 architecture meets high-speed, but it takes much more power and chip area than a radix-8 architecture. A radix-8 architecture needs less power and chip area than the other, but it has a bottleneck of generating three times the multiplicand problem. An optimized hybrid architecture performs tile radix-4 multiplication partially in parallel with the generation of three times the multiplicand for use of tile radix-8 multiplication. It reduces the concerned bit width of multiplier in radix-8 multiplication.

  • PDF

A high-speed complex multiplier based on redundant binary arithmetic (Redundant binary 연산을 이용한 고속 복소수 승산기)

  • 신경욱
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.2
    • /
    • pp.29-37
    • /
    • 1997
  • A new algorithm and parallel architecture for high-speed complex number multiplication is presented, and a prototype chip based on the proposed approach is designed. By employing redundant binary (RB) arithmetic, an N-bit complex number multiplication is simplified to two RB multiplications (i.e., an addition of N RB partial products), which are responsible for real and imaginary parts, respectively. Also, and efficient RB encoding scheme proposed in this paper enables to generate RB partial products without additional hardware and delay overheads compared with binary partial product generation. The proposed approach leads to a highly parallel architecture with regularity and modularity. As a results, it results in much simpler realization and higher performance than the classical method based on real multipliers and adders. As a test vehicle, a prototype 8-b complex number multiplier core has been fabricated using $0.8\mu\textrm{m}$ CMOS technology. It contains 11,500 transistors on the area of about $1.05 \times 1.34 textrm{mm}^2$. The functional and speed test results show that it can safely operate with 200 MHz clock at $V_{DD}=2.5 V$, and consumes about 90mW.

  • PDF

Design of a Low-Power Parallel Multiplier Using Low-Swing Technique (저 전압 스윙 기술을 이용한 저 전력 병렬 곱셈기 설계)

  • Kim, Jeong-Beom
    • The KIPS Transactions:PartA
    • /
    • v.14A no.3 s.107
    • /
    • pp.147-150
    • /
    • 2007
  • This paper describes a new low-swing inverter for low power consumption. To reduce a power consumption, an output voltage swing is in the range from 0 to VDD-2VTH. This can be done by the inverter structure that allow a full swing or a swing on its input terminal without leakage current. Using this low-swing voltage technology, we proposed a low-power 16$\times$16 bit parallel multiplier. The proposed circuits are designed with Samsung 0.35$\mu$m standard CMOS process at a 3.3V supply voltage. The validity and effectiveness are verified through the HSPICE simulation.. Compared to the previous works, this circuit can reduce the power consumption rate of 17.3% and the power-delay product of 16.5%.

A New Complex-Number Multiplication Algorithm using Radix-4 Booth Recoding and RB Arithmetic, and a 10-bit CMAC Core Design (Radix-4 Booth Recoding과 RB 연산을 이용한 새로운 복소수 승산 알고리듬 및 10-bit CMAC코어 설계)

  • 김호하;신경욱
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.9
    • /
    • pp.11-20
    • /
    • 1998
  • High-speed complex-number arithmetic units are essential to baseband signal processing of modern digital communication systems such as channel equalization, timing recovery, modulation and demodulation. In this paper, a new complex-number multiplication algorithm is proposed, which is based on redundant binary (RB) arithmetic combined with radix-4 Booth recoding scheme. The proposed algorithm reduces the number of partial product by one-half as compared with the conventional direct method using real-number multipliers and adders. It also leads to a highly parallel architecture and simplified circuit, resulting in high-speed operation and low power dissipation. To demonstrate the proposed algorithm, a prototype complex-number multiplier-accumulator (CMAC) core with 10-bit operands has been designed using 0.8-$\mu\textrm{m}$ N-Well CMOS technology. The designed CMAC core contains about 18,000 transistors on the area of about 1.60 ${\times}$ 1.93 $\textrm{mm}^2$. The functional and speed test results show that it can operate with 120-MHz clock at V$\sub$DD/=3.3-V, and its power consumption is given to about 63-mW.

  • PDF

Implementation of High-radix Modular Exponentiator for RSA using CRT (CRT를 이용한 하이래딕스 RSA 모듈로 멱승 처리기의 구현)

  • 이석용;김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.4
    • /
    • pp.81-93
    • /
    • 2000
  • In a methodological approach to improve the processing performance of modulo exponentiation which is the primary arithmetic in RSA crypto algorithm, we present a new RSA hardware architecture based on high-radix modulo multiplication and CRT(Chinese Remainder Theorem). By implementing the modulo multiplier using radix-16 arithmetic, we reduced the number of PE(Processing Element)s by quarter comparing to the binary arithmetic scheme. This leads to having the number of clock cycles and the delay of pipelining flip-flops be reduced by quarter respectively. Because the receiver knows p and q, factors of N, it is possible to apply the CRT to the decryption process. To use CRT, we made two s/2-bit multipliers operating in parallel at decryption, which accomplished 4 times faster performance than when not using the CRT. In encryption phase, the two s/2-bit multipliers can be connected to make a s-bit linear multiplier for the s-bit arithmetic operation. We limited the encryption exponent size up to 17-bit to maintain high speed, We implemented a linear array modulo multiplier by projecting horizontally the DG of Montgomery algorithm. The H/W proposed here performs encryption with 15Mbps bit-rate and decryption with 1.22Mbps, when estimated with reference to Samsung 0.5um CMOS Standard Cell Library, which is the fastest among the publications at present.

Design and Analysis of a 2-digit-serial systolic multiplier for GF($2^m$) (GF($2^m$)상에서 2-디지트 시리얼 시스톨릭 곱셈기 설계 및 분석)

  • 김기원;이건직;유기영
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10a
    • /
    • pp.605-607
    • /
    • 2000
  • 본 논문에서는 유한 필드 GF(2m)상에서 모듈러 곱셈 A(x)B(x) mod p(x)를 수행하는 2-디지트 시리얼 (2-digit-serial) 시스톨릭 어레이 구조인 곱셈기를 제안하였다. LSB-first 곱셈 알고리즘을 분석한 후 2-디지트 시리얼 형태의 자료의존 그래프(data dependency graph, 이하 DG)를 생성하여 시스톨릭 어레이를 설계하였다. 제안한 구조는 정규적이고 서로 반대 방향으로 진행하는 에지들이 없다. 그래서 VLSI 구현에 적합하다. 제안한 2-디지트 시리얼 곱셈기는 비트-패러럴(bit-parallel) 곱셈기 보다는 적은 하드웨어를 사용하며 비트-시리얼(bit-serial) 곱셈기 보다는 빠르다. 본 논문에서 제안한 2-디지트 시리얼 시스톨릭 곱셈기는 기존의 같은 종류의 곱셈기 보다 처리기의 최대 지연 시간이 적다. 그러므로 전체 시스톨릭 곱셈기의 처리시간을 향상시킬 수 있다.

  • PDF