• Title/Summary/Keyword: floating-point arithmetic

Search Result 66, Processing Time 0.025 seconds

DCT Implementation on FPGA for HDTV Encoder (FPGA를 이용한 HDTV인코더를 위한 DCT회로의 구현)

  • 김우철;정규철;고광철;정재명
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.235-238
    • /
    • 2002
  • This paper presents a way of a novel FPGA implementation of DCT. It shows how to limit the required bits on each DCT processing step, instead of implementing high-cost 64-bit floating-point arithmetic of IEEE Std 754-1985 on FPGA. ID-DCT implementation has been done which operates at 30 frame per second with 1920${\times}$1080 resolution.

  • PDF

Meshfree/GFEM in hardware-efficiency prospective

  • Tian, Rong
    • Interaction and multiscale mechanics
    • /
    • v.6 no.2
    • /
    • pp.197-210
    • /
    • 2013
  • A fundamental trend of processor architecture evolving towards exaflops is fast increasing floating point performance (so-called "free" flops) accompanied by much slowly increasing memory and network bandwidth. In order to fully enjoy the "free" flops, a numerical algorithm of PDEs should request more flops per byte or increase arithmetic intensity. A meshfree/GFEM approximation can be the class of the algorithm. It is shown in a GFEM without extra dof that the kind of approximation takes advantages of the high performance of manycore GPUs by a high accuracy of approximation; the "expensive" method is found to be reversely hardware-efficient on the emerging architecture of manycore.

A Design of Low-power/Small-area Arithmetic Units for Mobile 3D Graphic Accelerator (휴대형 3D 그래픽 가속기를 위한 저전력/저면적 산술 연산기 회로 설계)

  • Kim Chay-Hyeun;Shin Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.5
    • /
    • pp.857-864
    • /
    • 2006
  • This paper describes a design of low-power/small-area arithmetic circuits which are vector processing unit powering nit, divider unit and square-root unit for mobile 3D graphic accelerator. To achieve area-efficient and low-power implementation that is an essential consideration for mobile environment, the fixed-point f[mat of 16.16 is adopted instead of conventional floating-point format. The vector processing unit is designed using redundant binary(RB) arithmetic. As a result, it can operate 30% faster and obtained gate count reduction of 10%, compared to the conventional methods which consist of four multipliers and three adders. The powering nit, divider unit and square-root nit are based on logarithm number system. The binary-to-logarithm converter is designed using combinational logic based on six-region approximation method. So, the powering mit, divider unit and square-root unit reduce gate count when compared with lookup table implementation.

Motion Control Algorithm Expanding Arithmetic Operation for Low-Cost Microprocessor (저가형 마이크로프로세서를 위한 연산처리 확장 모션제어 알고리즘)

  • Moon, Sang-Chan;Kim, Jae-Jun;Nam, Kyu-Min;Kim, Byoung-Soo;Lee, Soon-Geul
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.18 no.12
    • /
    • pp.1079-1085
    • /
    • 2012
  • For precise motion control, S-curve velocity profile is generally used but it has disadvantage of relatively long calculation time for floating-point arithmetics. In this paper, we present a new generating method for velocity profile to reduce delay time of profile generation so that it overcomes such disadvantage and enhances the efficiency of precise motion control. In this approach, the velocity profile is designed based on the gamma correction expression that is generally used in image processing to obtain a smoother movement without any critical jerk. The proposed velocity profile is designed to support both T-curve and S-curve velocity profile. It can generate precise profile by adding an offset to the velocity profile with decimals under floating point that are not counted during gamma correction arithmetic operation. As a result, the operation time is saved and the efficiency is improved. The proposed method is compared with the existing method that generates velocity profile using ring buffer on a 8-bit low-cost MCU. The result shows that the proposed method has no delay in generating driving profile with good accuracy of each cycle velocity. The significance of the proposed method lies in reduction of the operation time without degrading the motion accuracy. Generated driving signal also shows to verify effectiveness of the proposed method.

An implementation of a unified ALU in multi-core GPGPU based on SIMT architecture (SIMT 구조 기반 멀티코어 GPGPU의 통합 ALU 설계)

  • Kyung, Gyu-taek;Kwak, Jae-Chang;Lee, Kwang-yeob
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.10a
    • /
    • pp.540-543
    • /
    • 2013
  • This paper describes an implementation of a unified ALU on multi-core GPGPU based on SIMT architecture. Our unified ALU can operate conditional branch instructions, data movement instructions, integer arithmetic instructions and floating-point arithmetic instructions. Since multi-core GPGPU contains a lot of ALU for parallel processing of various types, the main point of this paper is to design the minimum size ALU by unifying similar processing of each operations on circit level. All instrunctions were tested by making a test program. And we compare this results with results of CPU operations to verify our ALU. Our unified ALU's gate size is approximately 20,000 and the maximum operation frequency is 430MHz.

  • PDF

A Design of Radix-2 SRT Floating-Point Divider Unit using ]Redundant Binary Number System (Redundant Binary 수치계를 이용한 radix-2 SRT부동 소수점 제산기 유닛 설계)

  • 이종남;신경욱
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.5 no.3
    • /
    • pp.517-524
    • /
    • 2001
  • This paper describes a design of radix-2 SRT divider unit, which supports IEEE-754 floating-point standard, using redundant binary number system (RBNS). With the RBNS, the partial quotient decision logic can operate about 20-% faster, as well as can be implemented with a simple hardware when compared to the conventional methods based on two's complement arithmetic. By using a new redundant binary adder proposed in this paper, the mantissa divider is efficiently implemented, thus resulting in about 20% smaller area than other works. The divider unit supports double precision format, five exceptions and four rounding modes. It was verified with Verilog HDL and Verilog-XL.

  • PDF

Design of Floating-Point Multiplier for Mobile Graphics Application (모바일 그래픽스 응용을 위한 부동소수점 승산기의 설계)

  • Choi, Byeong-Yoon;Salcic, Zoran
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.3
    • /
    • pp.547-554
    • /
    • 2008
  • In this paper, two-stage pipelined floating-point multiplier (FP-MUL) is designed. The FP-MUL processor supports single precision multiplication for 3D graphic APIs, such as OpenGL and Direct3D and has area-efficient and low-latency architecture via saturated arithmetic, area-efficient sticky-bit generator, and flagged prefix adder. The FP-MUL has about 4-ns delay time under $0.13{\mu}m$ CMOS standard cell library and consists of about 7,500 gates. Because its maximum performance is about 250 MFLOPS, it can be applicable to mobile 3D graphics application.

A Study On the Design of a Floating Point Unit for MPEG-2 AAC Decoder (MPEG-2 AAC 복호기를 위한 부동소수점유닛 설계에 관한 연구)

  • 구대성;김필중;김종빈
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.4
    • /
    • pp.355-355
    • /
    • 2002
  • In this paper, we designed a FPU(floating point unit) that it is very important and requires of high density when digital audio is designed. Almost audio system must support the multi-channel and required for high quality. A floating point arithmetic function in MPEG-2 AAC that implemented by hardware is able to realtime decoding when DSP realization. The reason is that MPEG-2 AAC is compatible to the Audio field of MPEG-4 and afterwards. We designed a FPU by hardware to increase the speed of a floating point unit with much calculation part in the MPEG-2 AAC Decoder. A FPU is composed of a multiplier and an adder. A multiplier used the Radix-4 Booth algorithm and an adder adopted 1's complement method for speed up. A form of a floating point unit has 8bit of exponent part and 24bit of mantissa. It's compatible with the IEEE single precision format and adopted a pipeline architecture to increase the speed of a processor. All of sub blocks are based on ISO/IEC 13818-7 standard. The algorithm is tested by C language and the design does by use of VHDL(VHSIC Hardware Description Language). The maximum operation speed is 23.2MHz and the stable operation speed is 19MHz.

Implementation of a 3D Graphics Hardwired T&L Accelerator based on a SoC Platform for a Mobile System (SoC 플랫폼 기반 모바일용 3차원 그래픽 Hardwired T&L Accelerator 구현)

  • Lee, Kwang-Yeob;Koo, Yong-Seo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.9
    • /
    • pp.59-70
    • /
    • 2007
  • In this paper, we proposed an effective T&L(Transform & Lighting) Processor architecture for a real time 3D graphics acceleration SoC(System on a Chip) in a mobile system. We designed Floating point arithmetic IPs for a T&L processor. And we verified IPs using a SoC Platform. Designed T&L Processor consists of 24 bit floating point data format and 16 bit fixed point data format, and supports the pipeline keeping the balance between Transform process and Lighting process using a parallel computation of 3D graphics. The delay of pipeline processing only Transform operation is almost same as the delay processing both Transform operation and Lighting operation. Designed T&L Processor is implemented and verified using a SoC Platform. The T&L Processor operates at 80MHz frequency in Xilinx-Virtex4 FPGA. The processing speed is measured at the rate of 20M Vertexes/sec.

Fast Laser Triangular Measurement System using ARM and FPGA (ARM 및 FPGA를 이용한 고속 레이저 삼각측량 시스템)

  • Lee, Sang-Moon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.1
    • /
    • pp.25-29
    • /
    • 2013
  • Recently ARM processor's processing power has been increasing rapidly as it has been applied to consumer electronics products. Because of its computing power and low power consumption, it is used to various embedded systems.( including vision processing systems.) Embedded linux that provides well-made platform and GUI is also a powerful tool for ARM based embedded systems. So short period to develop is one of major advantages to the ARM based embedded system. However, for real-time date processing applications such as an image processing system, ARM needs additional equipments such as FPGA that is suitable to parallel processing applications. In this paper, we developed an embedded system using ARM processor and FPGA. FPGA takes time consuming image preprocessing and numerical algorithms needs floating point arithmetic and user interface are implemented using the ARM processor. Overall processing speed of the system is 60 frames/sec of VGA images.