New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorithm

Radix-2 MBA 기반 병렬 MAC의 VLSI 구조

  • Published : 2008.04.25

Abstract

In this paper, we propose a new architecture of multiplier-and-accumulator (MAC) for high speed multiplication and accumulation arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator which has the largest delay in MAC was removed and its function was included into CSA, the overall performance becomes to be elevated. The proposed CSA tree uses 1's complement-based radix-2 modified booth algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of operands. The CSA propagates the carries by the least significant bits of the partial products and generates the least significant bits in advance for decreasing the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits not the output of the final adder for improving the performance by optimizing the efficiency of pipeline scheme. The proposed architecture was synthesized with $250{\mu}m,\;180{\mu}m,\;130{\mu}m$ and 90nm standard CMOS library after designing it. We analyzed the results such as hardware resource, delay, and pipeline which are based on the theoretical and experimental estimation. We used Sakurai's alpha power low for the delay modeling. The proposed MAC has the superior properties to the standard design in many ways and its performance is twice as much than the previous research in the similar clock frequency.

본 논문에서는 고속의 곱셈-누적 연산을 수행할 수 있는 새로운 MAC의 구조를 제안한다. 곱셈과 누적 덧셈 연산을 통합하고 하이브리드 형태의 CSA 구조를 고안하여 임계경로를 감소시키고 출력율을 개선하였다. 즉, 가장 큰 지연시간을 갖는 누적기 자체를 제거하고 누적기의 기능을 CSA에 포함시킴으로써 전체적인 성능을 향상시킨다. 제안된 CSA 트리는 1의 보수 기반의 MBA 알고리즘을 이용하고, 연산자의 밀도를 높이고자 부호비트를 위한 수정된 배열형태를 갖는다. 또한 최종 덧셈기의 비트수를 줄이기 위해서 CSA 트리 내에 2비트 CLA를 사용하여 하위 비트의 캐리를 전파하고 하위 비트들에 대한 출력을 미리 생성한다. 또한 파이프라인의 효율을 최적화시켜 출력율을 증가시키고자 최종 덧셈기의 출력이 아닌 합과 캐리 형태의 중간 연산결과들을 누적시킨다. 제안한 하드웨어를 설계한 후에 $250{\mu}m,\;180{\mu}m,\;130{\mu}m$, 그리고 90nm CMOS 라이브러리를 이용하여 합성하였다. 이론 및 실험적인 결과를 토대로 제안한 MAC의 하드웨어 자원, 지연시간, 그리고 파이프라인 등의 결과에 대해 분석하였다. 지연시간은 수정된 Sakurai의 alpha power low를 이용하였다. 결과를 살펴보면 제안한 MAC은 표준 설계에 대해서는 여러 측면에서 매우 우수한 특성을 보였고, 최근 연구와 비교할 때 클록속도는 거의 유사하면서 성능은 두 배로 우수하였다.

Keywords

References

  1. J. J. F. Cavanagh, Digital Computer Arithmetic. New York: McGraw-Hill, 1984
  2. ISO/IEC 13818-1, 2, 3, Information Technology-Coding of Moving Picture and Associated Autio, MPEG-2 Draft International Standard, 1994
  3. Martin Boliek, et al., JPEG 2000 Part I Fina1119l Draft International Standard, ISO/IEC JTC1/SC29 WG1, 24 Aug. 2000
  4. O. L. MacSorley, "High Speed Arithmetic in Binary Computers", Proc. IRE, vol. 49, Jan. 1961
  5. S.Waser and M. J. Flynn, Introduction to Arithmetic for Digital Systems Designers. New York: Holt, Rinehart and Winston, 1982
  6. A. R. Omondi, Computer Arithmetic Systems. Englewood Cliffs, NJ:Prentice-Hall, 1994
  7. Israel Koren, "Computer Arithmetic Algorithms", John wiley Inc., pp. 71-123, 1993
  8. Yoshita Harata, et al., "A High-Speed Multiplier Using a Redundant Binary Adder Tree," IEEE J. of Solide-State Circuits, Vol. sc-22, no. 1, pp.28-33, Feb 1987
  9. A. D. Booth, "A Signed Binary Multiplication Technique", Quart. J. Math., vol IV, pt. 2, 1952
  10. C. S. Wallace, "A Suggestion for a Fast Multiplier", IEEE Trans. Electron Comp., vol. EC-13, pp. 14-17, Feb. 1964 https://doi.org/10.1109/PGEC.1964.263830
  11. A. R. Cooper, "Parallel architecture modified Booth multiplier," IEE Proc.-G, vol. 135, pp. 125-128, 1988 https://doi.org/10.1049/ip-g-1.1988.0019
  12. N. R. Shanbag and P. Juneja, "Parallel implementation of a 4x4-bit multiplier using modified Booth's algorithm," IEEE J. Solid-State Circuits, vol. 23, pp. 1010-1013, 1988 https://doi.org/10.1109/4.353
  13. G. Goto, T. Sato, M. Nakajima, and T. Sukemura, "A 54x54 regular structured tree multiplier," IEEE J. Solid-State Circuits, vol. 27, pp. 1229-1236, Sept. 1992 https://doi.org/10.1109/4.149426
  14. J. Fadavi-Ardekani, "M NBooth encoded multiplier generator using optimized Wallace trees," IEEE Trans. VLSI Syst., vol. 1, pp. 120-125, 1993 https://doi.org/10.1109/92.238424
  15. N. Ohkubo, M. Suzuki, T. Shinbo, T. Yamanaka, A. Shimizu, K. Sasaki, and Y. Nakagome, "A 4.4 ns CMOS 5454 multiplier using pass-transistor multiplexer," IEEE J. Solid-State Circuits, vol. 30, pp. 251-257, Mar. 1995 https://doi.org/10.1109/4.364439
  16. A. Tawfik, F. Elguibaly, and P. Agathoklis, "New realization and implementation of fixed-point IIR digital filters," J. Circuits. Syst., Comput., vol. 7, no. 3, pp. 191-209, 1997 https://doi.org/10.1142/S0218126697000140
  17. A. Tawfik, F. Elguibaly, M. N. Fahmi, E. Abdel-Raheem, and P. Agathoklis, "High-speed area-efficient inner-product processor," Can. J. Elec. Comput. Eng., vol. 19, pp. 187-191, 1994 https://doi.org/10.1109/CJECE.1994.6591122
  18. F. Elguibaly and A. Rayhan, "Overflow handling in inner-product processors," in Proc. IEEE Pacific Rim Conf. Communication, Computers, and Signal Processing, Victoria, B.C., Canada, Aug. 20-22, 1997, pp. 117-120
  19. F. Elguibaly, "A Fast Parallel Multiplier- Accumulator Using The Modified Booth Algorithm", IEEE. Trans. on circuits and Systems, vol. 27, pp. 902-908, Sep. 2000
  20. T. Sakurai and A. R. Newton, "Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas," IEEE J. Solid-State Circuits, vol. 25, pp. 584-594, Feb. 1990 https://doi.org/10.1109/4.52187
  21. R. M. Rao and A. S. Bopardikar, Wavelet Transforms, Introduction to Theory and Applications, Addison-Wesley Inc., Reading, MA, 1998