• Title/Summary/Keyword: Arithmetic units

Search Result 59, Processing Time 0.029 seconds

Design of a Floating Point Multiplier for IEEE 754 Single-Precision Operations (IEEE 754 단정도 부동 소수점 연산용 곱셈기 설계)

  • Lee, Ju-Hun;Chung, Tae-Sang
    • Proceedings of the KIEE Conference
    • /
    • 1999.11c
    • /
    • pp.778-780
    • /
    • 1999
  • Arithmetic unit speed depends strongly on the algorithms employed to realize the basic arithmetic operations.(add, subtract multiply, and divide) and on the logic design. Recent advances in VLSI have increased the feasibility of hardware implementation of floating point arithmetic units and microprocessors require a powerful floating-point processing unit as a standard option. This paper describes the design of floating-point multiplier for IEEE 754-1985 Single-Precision operation. Booth encoding algorithm method to reduce partial products and a Wallace tree of 4-2 CSA is adopted in fraction multiplication part to generate the $32{\times}32$ single-precision product. New scheme of rounding and sticky-bit generation is adopted to reduce area and timing. Also there is a true sign generator in this design. This multiplier have been implemented in a ALTERA FLEX EPF10K70RC240-4.

  • PDF

Low Complexity Systolic Montgomery Multiplication over Finite Fields GF(2m) (유한체상의 낮은 복잡도를 갖는 시스톨릭 몽고메리 곱셈)

  • Lee, Keonjik
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.18 no.1
    • /
    • pp.1-9
    • /
    • 2022
  • Galois field arithmetic is important in error correcting codes and public-key cryptography schemes. Hardware realization of these schemes requires an efficient implementation of Galois field arithmetic operations. Multiplication is the main finite field operation and designing efficient multiplier can clearly affect the performance of compute-intensive applications. Diverse algorithms and hardware architectures are presented in the literature for hardware realization of Galois field multiplication to acquire a reduction in time and area. This paper presents a low complexity semi-systolic multiplier to facilitate parallel processing by partitioning Montgomery modular multiplication (MMM) into two independent and identical units and two-level systolic computation scheme. Analytical results indicate that the proposed multiplier achieves lower area-time (AT) complexity compared to related multipliers. Moreover, the proposed method has regularity, concurrency, and modularity, and thus is well suited for VLSI implementation. It can be applied as a core circuit for multiplication and division/exponentiation.

Design of Floating-point Processing Unit for Multi-chip Superscalar Microprocessor (다중 칩 수퍼스칼라 마이크로프로세서용 부동소수점 연산기의 설계)

  • 이영상;강준우
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.1153-1156
    • /
    • 1998
  • We describe a design of a simple but efficient floatingpoint processing architecture expoiting concurrent execution of scalar instructions for high performance in general-purpose microprocessors. This architecture employs 3 stage pipeline asyncronously working with integer processing unit to regulate instruction flows between two arithmetic units.

  • PDF

A VLSI Architecture of Systolic Array for FET Computation (고속 퓨리어 변환 연산용 VLSI 시스토릭 어레이 아키텍춰)

  • 신경욱;최병윤;이문기
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.9
    • /
    • pp.1115-1124
    • /
    • 1988
  • A two-dimensional systolic array for fast Fourier transform, which has a regular and recursive VLSI architecture is presented. The array is constructed with identical processing elements (PE) in mesh type, and due to its modularity, it can be expanded to an arbitrary size. A processing element consists of two data routing units, a butterfly arithmetic unit and a simple control unit. The array computes FFT through three procedures` I/O pipelining, data shuffling and butterfly arithmetic. By utilizing parallelism, pipelining and local communication geometry during data movement, the two-dimensional systolic array eliminates global and irregular commutation problems, which have been a limiting factor in VLSI implementation of FFT processor. The systolic array executes a half butterfly arithmetic based on a distributed arithmetic that can carry out multiplication with only adders. Also, the systolic array provides 100% PE activity, i.e., none of the PEs are idle at any time. A chip for half butterfly arithmetic, which consists of two BLC adders and registers, has been fabricated using a 3-um single metal P-well CMOS technology. With the half butterfly arithmetic execution time of about 500 ns which has been obtained b critical path delay simulation, totla FFT execution time for 1024 points is estimated about 16.6 us at clock frequency of 20MHz. A one-PE chip expnsible to anly size of array is being fabricated using a 2-um, double metal, P-well CMOS process. The chip was layouted using standard cell library and macrocell of BLC adder with the aid of auto-routing software. It consists of around 6000 transistors and 68 I/O pads on 3.4x2.8mm\ulcornerarea. A built-i self-testing circuit, BILBO (Built-In Logic Block Observation), was employed at the expense of 3% hardware overhead.

  • PDF

Kinematic Wave Rainfall-Runoff Model Using CUDA FORTRAN (CUDA FORTRAN을 이용한 운동파 강우유출모형)

  • Kim, Boram;Kim, Dae-Hong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.271-271
    • /
    • 2018
  • 그래픽 처리 장치(GPU: Graphic Processing Units)는 그래픽 처리에 특화된 수많은 산술논리연산자 (ALU: Arithmetic Logic Unit)와 이에 관련된 인스트럭션Instruction)으로 인해 중앙 처리 장치(CPU: Central Processing Units) 보다 훨씬 빠른 계산 처리를 수행할 수 있다. 최근에는 FORTRAN에 의해 구현된 많은 수치모형들이 현실적인 모델링 방법의 발달로 인해 더 많은 계산량과 계산시간을 필요로 한다. 이 연구에서는 GPU 상의 범용 계산GPGPU : General-Purpose computing on Graphics Processing Units) 기반 운동파 강우유출모형(Kinematic Wave Rainfall-Runoff Model)이 CUDA(Compute Unified Device Architecture) FORTRAN을 사용하여 구현되었다. CUDA FORTRAN 운동파 강우유출모형의 계산 결과는 검증된 CPU 기반 운동파 강우유출모형의 계산 결과와 비교하여 검증되었으며, 잘 일치함을 보여 주었다. CUDA FORTRAN 운동파 강우유출모형은 CPU 기반 모형에 비해 약 20 배 더 빠른 계산 시간을 보였다. 또한 계산 영역이 커짐에 따라 CPU 버전에 비해 CUDA FORTRAN 버전의 계산 효율이 향상되었다.

  • PDF

Fast Sequential Optimal Normal Bases Multipliers over Finite Fields (유한체위에서의 고속 최적정규기저 직렬 연산기)

  • Kim, Yong-Tae
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.8
    • /
    • pp.1207-1212
    • /
    • 2013
  • Arithmetic operations over finite fields are widely used in coding theory and cryptography. In both of these applications, there is a need to design low complexity finite field arithmetic units. The complexity of such a unit largely depends on how the field elements are represented. Among them, representation of elements using a optimal normal basis is quite attractive. Using an algorithm minimizing the number of 1's of multiplication matrix, in this paper, we propose a multiplier which is time and area efficient over finite fields with optimal normal basis.

The design of quantization and inverse quantization unit (Q_IQ unit) module with video encoder (비디오 인코더용 양자화 및 역양자화기(Q_IQ unit) 모듈의 설계)

  • 김은원;조원경
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.11
    • /
    • pp.20-28
    • /
    • 1997
  • In this paper, quantization and inverse quantizatio unit, a sa component of MPEG-2 moving picture compression system, ar edesigned. In the processing of quantization, this design adopted newly designed arithmetic units in which quantization matrices and scale code was expressed with SD(signed-digit) code. In the arithmetic unit of inverse quantization, quantization scale code, which has 5-bits length, is splited into two pieces; 2-bits for control code, 3-bits for quantization data, and the method to devise quantization step size is proposed. The design was coded with VHDL and synthesis results in that it consumed about 6,110 gates, and operating speed is 52MHz.

  • PDF

Mathematical Thinking and Developing Mathematical Structure

  • Cheng, Chun Chor Litwin
    • Research in Mathematical Education
    • /
    • v.14 no.1
    • /
    • pp.33-50
    • /
    • 2010
  • The mathematical thinking which transforms important mathematical content and developed into mathematical structure is a vital process in building up mathematical ability as mathematical knowledge based on structure. Such process based on students' recognition of mathematical concept. Developing mathematical thinking into mathematical structure happens when different cognitive units are connected and compressed to form schema of solution, which could happen through some guided problems. The effort of arithmetic approach in problem solving did not necessarily provide students the structure schema of solution. The using of equation to solve the problem is based on the schema of building equation, and is not necessary recognizing the structure of the solution, as the recognition of structure may be lost in the process of simplification of algebraic expressions, leaving only the final numeric answer of the problem.

Real-time Implementation of an Identifier for Nonstationary Time-varying Signals and Systems

  • Kim, Jong-Weon;Kim, Sung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.3E
    • /
    • pp.13-18
    • /
    • 1996
  • A real-time identifier for the nonstationary time-varying signals and systems was implemented using a low cost DSP (digital signal processing) chip. The identifier is comprised of I/O units, a central processing unit, a control unit and its supporting software. In order t estimate the system accurately and to reduce quantization error during arithmetic operation, the firmware was programmed with 64-bit extended precision arithmetic. The performance of the identifier was verified by comparing with the simulation results. The implemented real-time identifier has negligible quantization errors and its real-time processing capability crresponds to 0.6kHz for the nonstationary AR (autoregressive) model with n=4 and m=1.

  • PDF

A Design of a 8-Thread Graphics Processor Unit with Variable-Length Instructions

  • Lee, Kwang-Yeob;Kwak, Jae-Chang
    • Journal of information and communication convergence engineering
    • /
    • v.6 no.3
    • /
    • pp.285-288
    • /
    • 2008
  • Most of multimedia processors for 2D/3D graphics acceleration use a lot of integer/floating point arithmetic units. We present a new architecture with an efficient ALU, built in a smaller chip size. It reduces instruction cycles significantly based on a foundation of multi-thread operation, variable length instruction words, dual phase operation, and phase instruction's coordination. We can decrease the number of instruction cycles up to 50%, and can achieve twice better performance.