• Title/Summary/Keyword: IEEE754-2008

Search Result 8, Processing Time 0.017 seconds

Floating Point Converter Design Supporting Double/Single Precision of IEEE754 (IEEE754 단정도 배정도를 지원하는 부동 소수점 변환기 설계)

  • Park, Sang-Su;Kim, Hyun-Pil;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.10
    • /
    • pp.72-81
    • /
    • 2011
  • In this paper, we proposed and designed a novel floating point converter which supports single and double precisions of IEEE754 standard. The proposed convertor supports conversions between floating point number single/double precision and signed fixed point number(32bits/64bits) as well as conversions between signed integer(32bits/64bits) and floating point number single/double precision and conversions between floating point number single and double precisions. We defined a new internal format to convert various input types into one type so that overflow checking could be conducted easily according to range of output types. The internal format is similar to the extended format of floating point double precision defined in IEEE754 2008 standard. This standard specifies that minimum exponent bit-width of the extended format of floating point double precision is 15bits, but 11bits are enough to implement the proposed converting unit. Also, we optimized rounding stage of the convertor unit so that we could make it possible to operate rounding and represent correct negative numbers using an incrementer instead an adder. We designed single cycle data path and 5 cycles data path. After describing the HDL model for two data paths of the convertor, we synthesized them with TSMC 180nm technology library using Synopsys design compiler. Cell area of synthesis result occupies 12,886 gates(2 input NAND gate), and maximum operating frequency is 411MHz.

Floating Point Unit Design for the IEEE754-2008 (IEEE754-2008을 위한 고속 부동소수점 연산기 설계)

  • Hwang, Jin-Ha;Kim, Hyun-Pil;Park, Sang-Su;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.10
    • /
    • pp.82-90
    • /
    • 2011
  • Because of the development of Smart phone devices, the demands of high performance FPU(Floating-point Unit) becomes increasing. Therefore, we propose the high-speed single-/double-precision FPU design that includes an elementary add/sub unit and improved multiplier and compare and convert units. The most commonly used add/sub unit is optimized by the parallel rounding unit. The matrix operation is used in complex calculation something like a graphic calculation. We designed the Multiply-Add Fused(MAF) instead of multiplier to calculate the matrix more quickly. The branch instruction that is decided by the compare operation is very frequently used in various programs. We bypassed the result of the compare operation before all the pipeline processes ended to decrease the total execution time. And we included additional convert operations that are added in IEEE754-2008 standard. To verify our RTL designs, we chose four hundred thousand test vectors by weighted random method and simulated each unit. The FPU that was synthesized by Samsung's 45-nm low-power process satisfied the 600-MHz operation frequency. And we confirm a reduction in area by comparing the improved FPU with the existing FPU.

An implementation of a unified ALU in multi-core GPGPU based on SIMT architecture (SIMT 구조 기반 멀티코어 GPGPU의 통합 ALU 설계)

  • Kyung, Gyu-taek;Kwak, Jae-Chang;Lee, Kwang-yeob
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.10a
    • /
    • pp.540-543
    • /
    • 2013
  • This paper describes an implementation of a unified ALU on multi-core GPGPU based on SIMT architecture. Our unified ALU can operate conditional branch instructions, data movement instructions, integer arithmetic instructions and floating-point arithmetic instructions. Since multi-core GPGPU contains a lot of ALU for parallel processing of various types, the main point of this paper is to design the minimum size ALU by unifying similar processing of each operations on circit level. All instrunctions were tested by making a test program. And we compare this results with results of CPU operations to verify our ALU. Our unified ALU's gate size is approximately 20,000 and the maximum operation frequency is 430MHz.

  • PDF

Design of Dual-Path Decimal Floating-Point Adder (이중 경로 십진 부동소수점 가산기 설계)

  • Lee, Chang-Ho;Kim, Ji-Won;Hwang, In-Guk;Choi, Sang-Bang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.9
    • /
    • pp.183-195
    • /
    • 2012
  • We propose a variable-latency Decimal Floating Point(DFP) adder which adopts the dual data path scheme. It is to speed addition and subtraction of operand that has identical exponents. The proposed DFP adder makes use of L. K. Wang's operand alignment algorithm, but operates through high speed data-path in guaranteed accuracy range. Synthesis results show that the area of the proposed DFP adder is increased by 8.26% compared to the L. K. Wang's DFP adder, though critical path delay is reduced by 10.54%. It also operates at 13.65% reduced path than critical path in case of an operation which has two DFP operands with identical exponents. We prove that the proposed DFP adder shows higher efficiency than L. K. Wang's DFP adder when the ratio of identical exponents is larger than 2%.

Design of Parallel Decimal Multiplier using Limited Range of Signed-Digit Number Encoding (제한된 범위의 Signed-Digit Number 인코딩을 이용한 병렬 십진 곱셈기 설계)

  • Hwang, In-Guk;Kim, Kanghee;Yoon, WanOh;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.3
    • /
    • pp.50-58
    • /
    • 2013
  • In this paper, parallel decimal fixed-point multiplier which uses the limited range of Singed-Digit number encoding and the reduction step is proposed. The partial products are generated without carry propagation delay by encoding a multiplicand and a multiplier to the limited range of SD number. With the limited range of SD number, the proposed multiplier can improve the partial product reduction step by increasing the number of possible operands for multi-operand SD addition. In order to estimate the proposed parallel decimal multiplier, synthesis is implemented using Design Compiler with SMIC 180nm CMOS technology library. Synthesis results show that the delay of proposed parallel decimal multiplier is reduced by 4.3% and the area by 5.3%, compared to the existing SD parallel decimal multiplier. Despite of the slightly increased delay and area of partial product generation step, the total delay and area are reduced since the partial product reduction step takes the most proportion.

Design of Serial Decimal Multiplier using Simultaneous Multiple-digit Operations (동시연산 다중 digit을 이용한 직렬 십진 곱셈기의 설계)

  • Yu, ChangHun;Kim, JinHyuk;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.4
    • /
    • pp.115-124
    • /
    • 2015
  • In this paper, the method which improves the performance of a serial decimal multiplier, and the method which operates multiple-digit simultaneously are proposed. The proposed serial decimal multiplier reduces the delay by removing encoding module that generates 2X, 4X multiples, and by generating partial product using shift operation. Also, this multiplier reduces the number of operations using multiple-digit operation. In order to estimate the performance of the proposed multiplier, we synthesized the proposed multiplier with design compiler with SMIC 110nm CMOS library. Synthesis results show that the area of the proposed serial decimal multiplier is increased by 4%, but the delay is reduced by 5% compared to existing serial decimal multiplier. In addition, the trade off between area and latency with respect to the number of concurrent operations in the proposed multiple-digit multiplier is confirmed.

Design of Partial Product Accumulator using Multi-Operand Decimal CSA and Improved Decimal CLA (다중 피연산자 십진 CSA와 개선된 십진 CLA를 이용한 부분곱 누산기 설계)

  • Lee, Yang;Park, TaeShin;Kim, Kanghee;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.11
    • /
    • pp.56-65
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.

Implementation of Hardware Data Prefetcher Adaptable for Various State-of-the-Art Workload (다양한 최신 워크로드에 적용 가능한 하드웨어 데이터 프리페처 구현)

  • Kim, KangHee;Park, TaeShin;Song, KyungHwan;Yoon, DongSung;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.12
    • /
    • pp.20-35
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.