• Title/Summary/Keyword: floating-point

Search Result 496, Processing Time 0.022 seconds

Implementation of a 3D Graphics Hardwired T&L Accelerator based on a SoC Platform for a Mobile System (SoC 플랫폼 기반 모바일용 3차원 그래픽 Hardwired T&L Accelerator 구현)

  • Lee, Kwang-Yeob;Koo, Yong-Seo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.9
    • /
    • pp.59-70
    • /
    • 2007
  • In this paper, we proposed an effective T&L(Transform & Lighting) Processor architecture for a real time 3D graphics acceleration SoC(System on a Chip) in a mobile system. We designed Floating point arithmetic IPs for a T&L processor. And we verified IPs using a SoC Platform. Designed T&L Processor consists of 24 bit floating point data format and 16 bit fixed point data format, and supports the pipeline keeping the balance between Transform process and Lighting process using a parallel computation of 3D graphics. The delay of pipeline processing only Transform operation is almost same as the delay processing both Transform operation and Lighting operation. Designed T&L Processor is implemented and verified using a SoC Platform. The T&L Processor operates at 80MHz frequency in Xilinx-Virtex4 FPGA. The processing speed is measured at the rate of 20M Vertexes/sec.

Software Design Methodology of OFDM DVB-T Receiver using DSP-based Platform (DSP 기반 플랫폼을 이용한 OFDM DVB-T 반송파 복원부의 소프트웨어 설계 방법)

  • 신정헌;유형석;윤주현;박찬섭;정해주;조준동
    • Proceedings of the IEEK Conference
    • /
    • 2003.11c
    • /
    • pp.55-59
    • /
    • 2003
  • In this paper, we estimate the performance requirements of general-purpose DSP for Carrier Recovery of OFDM DVB-T receiver. Firstly, we transported the designed fixed-point OFDM DVB-T model to a floating-point software model written in C. Then, we measured the number of instruction cycles required for operation of Carrier Recovery in time. We use SignalMaster$\^$TM/ DSP platform of LYRtech Inc. as a environment of estimation, and Simulink$\^$TM/ as a graphical interface, Code Composer StudioTM of TI as profiler and compiler, and SPW$\^$TM/ for presenting functional reliability and comparing the performance distortion with fixed-point model. As a result, we show the required number of DSPs in our DSP-based system, and introduce the need of Multi-DSP-based system.

  • PDF

The Design of a Structure of Network Co-processor for SDR(Software Defined Radio) (SDR(Software Defined Radio)에 적합한 네트워크 코프로세서 구조의 설계)

  • Kim, Hyun-Pil;Jeong, Ha-Young;Ham, Dong-Hyeon;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.2A
    • /
    • pp.188-194
    • /
    • 2007
  • In order to become ubiquitous world, the compatibility of wireless machines has become the significant characteristic of a communication terminal. Thus, SDR is the most necessary technology and standard. However, among the environment which has different communication protocol, it's difficult to make a terminal with only hardware using ASIC or SoC. This paper suggests the processor that can accelerate several communication protocol. It can be connected with main-processor, and it is specialized PHY layer of network The C-program that is modeled with the wireless protocol IEEE802.11a and IEEE802.11b which are based on widely used modulation way; OFDM and CDM is compiled with ARM cross compiler and done simulation and profiling with Simplescalar-Arm version. The result of profiling, most operations were Viterbi operations and complex floating point operations. According to this result we suggested a co-processor which can accelerate Viterbi operations and complex floating point operations and added instructions. These instructions are simulated with Simplescalar-Arm version. The result of this simulation, comparing with computing only one ARM core, the operations of Viterbi improved as fast as 4.5 times. And the operations of complex floating point improved as fast as twice. The operations of IEEE802.11a are 3 times faster, and the operations of IEEE802.11b are 1.5 times faster.

CUDA-based Parallel Bi-Conjugate Gradient Matrix Solver for BioFET Simulation (BioFET 시뮬레이션을 위한 CUDA 기반 병렬 Bi-CG 행렬 해법)

  • Park, Tae-Jung;Woo, Jun-Myung;Kim, Chang-Hun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.1
    • /
    • pp.90-100
    • /
    • 2011
  • We present a parallel bi-conjugate gradient (Bi-CG) matrix solver for large scale Bio-FET simulations based on recent graphics processing units (GPUs) which can realize a large-scale parallel processing with very low cost. The proposed method is focused on solving the Poisson equation in a parallel way, which requires massive computational resources in not only semiconductor simulation, but also other various fields including computational fluid dynamics and heat transfer simulations. As a result, our solver is around 30 times faster than those with traditional methods based on single core CPU systems in solving the Possion equation in a 3D FDM (Finite Difference Method) scheme. The proposed method is implemented and tested based on NVIDIA's CUDA (Compute Unified Device Architecture) environment which enables general purpose parallel processing in GPUs. Unlike other similar GPU-based approaches which apply usually 32-bit single-precision floating point arithmetics, we use 64-bit double-precision operations for better convergence. Applications on the CUDA platform are rather easy to implement but very hard to get optimized performances. In this regard, we also discuss the optimization strategy of the proposed method.

Floating Point Unit Design for the IEEE754-2008 (IEEE754-2008을 위한 고속 부동소수점 연산기 설계)

  • Hwang, Jin-Ha;Kim, Hyun-Pil;Park, Sang-Su;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.10
    • /
    • pp.82-90
    • /
    • 2011
  • Because of the development of Smart phone devices, the demands of high performance FPU(Floating-point Unit) becomes increasing. Therefore, we propose the high-speed single-/double-precision FPU design that includes an elementary add/sub unit and improved multiplier and compare and convert units. The most commonly used add/sub unit is optimized by the parallel rounding unit. The matrix operation is used in complex calculation something like a graphic calculation. We designed the Multiply-Add Fused(MAF) instead of multiplier to calculate the matrix more quickly. The branch instruction that is decided by the compare operation is very frequently used in various programs. We bypassed the result of the compare operation before all the pipeline processes ended to decrease the total execution time. And we included additional convert operations that are added in IEEE754-2008 standard. To verify our RTL designs, we chose four hundred thousand test vectors by weighted random method and simulated each unit. The FPU that was synthesized by Samsung's 45-nm low-power process satisfied the 600-MHz operation frequency. And we confirm a reduction in area by comparing the improved FPU with the existing FPU.

Conceptual Design of Deep-sea Multi-Point Mooring by using Two-Point Mooring (2점지지계류를 활용한 심해 부유체의 다점지지계류 개념설계)

  • Park, In-Kyu;Kim, Kyong-Moo
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.45 no.4
    • /
    • pp.462-467
    • /
    • 2008
  • In this paper, we investigated the design method of mooring system in ultra deep sea and carried out the conceptual design for offshore West Africa oil field in ultra deep sea of 3000 meters. Recently, it was feasible to design and install the offshore floating structures in deep sea of up to 2000 meters. Due to the simplicity, two-point mooring design is fully utilized. Force-excursion curves are throughly examined to find out the feasibility of various combinations of mooring lines. Free length and pretension effects are discussed. It is found that composite materials including synthetic fiber rope may be good solution for ultra deep sea mooring design.

HARP의 부동소숫점 연산기 구조설계

  • Jo, Jeong-Yeon
    • ETRI Journal
    • /
    • v.10 no.3
    • /
    • pp.36-48
    • /
    • 1988
  • 본 논문에서는 부동소숫점연산 프로세서들의 최근 동향을 설명하면서 부동소숫점 연산기의 중요성을 강조하고, 한국전자통신연구소 프로세서구조연구실에서 개발하고 있는 HARP(High-performance Architecture for RISC type Processor)의 개발전략에 따른 부동소숫점 연산기(Floating-Point Unit : FPU)의 구조를 정의한다. 또한 HARP FPU의 설계구현을 마이크로 구조측면에서 설명한다. HARP의 CPU와 동일 칩상에 구현될 HARP FPU는 고유의 구조를 가지며 모든 부동소숫점 연산은 IEEE-754 표준을 따른다. HARP FPU는 고속의 부동소숫점 연산 유니트이며, HARP의 IPU(Integer Processing Unit)와는 독립적으로 동작되도록 설계되어서 HARP CPU의 전체적인 파이프라인 기능에 가능한 한 페날티를 주지 않도록 동작된다.

  • PDF

A Study on Hardware Implementation of a VSB Equalization System (VSB 등화시스템의 하드웨어 구현방법에 관한 연구)

  • 채승수;박래홍
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.10
    • /
    • pp.1314-1325
    • /
    • 1995
  • In this paper, we describe hardware implementation of VSB (Vestigial SideBand) mo-dulation equalization systems for HDTV (High Definition TeleVision). By modifying an adaptive equalization algorithm, we propose a hardware architecture with a low hardware cost and the performance close to floating-point operations. We also employ the pipeline concept to reduce the hardware cost. The effectiveness of the proposed hardware architecture is de- monstrated through computer simulation and the optimization result of VHDL circuit descriptions.

  • PDF

A study on the hardware implementation of the digicipher equalization system (DigiCipher 등하시스템의 하드웨어 구현방법에 관한 연구)

  • 채승수;반성범;이기헌;박래홍;김영상;이병욱
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.6
    • /
    • pp.176-185
    • /
    • 1996
  • In this paper, we present the modified CMA (constant modulus algorithm) and LMS (least mean square) algorithms for digiCipher system with reduced hardware cost, in which the pipelined architecture is employed. They yield the performance comparable to that using floating-point operations. We show the effecstiveness of the proposed architecture through the implementation results using VHDL.

  • PDF

A Design of a 8-Thread Graphics Processor Unit with Variable-Length Instructions

  • Lee, Kwang-Yeob;Kwak, Jae-Chang
    • Journal of information and communication convergence engineering
    • /
    • v.6 no.3
    • /
    • pp.285-288
    • /
    • 2008
  • Most of multimedia processors for 2D/3D graphics acceleration use a lot of integer/floating point arithmetic units. We present a new architecture with an efficient ALU, built in a smaller chip size. It reduces instruction cycles significantly based on a foundation of multi-thread operation, variable length instruction words, dual phase operation, and phase instruction's coordination. We can decrease the number of instruction cycles up to 50%, and can achieve twice better performance.