• Title/Summary/Keyword: 부동 소수점

Search Result 188, Processing Time 0.026 seconds

A Variable Latency Newton-Raphson's Floating Point Number Reciprocal Square Root Computation (가변 시간 뉴톤-랍손 부동소수점 역수 제곱근 계산기)

  • Kim Sung-Gi;Cho Gyeong-Yeon
    • The KIPS Transactions:PartA
    • /
    • v.12A no.5 s.95
    • /
    • pp.413-420
    • /
    • 2005
  • The Newton-Raphson iterative algorithm for finding a floating point reciprocal square mot calculates it by performing a fixed number of multiplications. In this paper, a variable latency Newton-Raphson's reciprocal square root algorithm is proposed that performs multiplications a variable number of times until the error becomes smaller than a given value. To find the rediprocal square root of a floating point number F, the algorithm repeats the following operations: '$X_{i+1}=\frac{{X_i}(3-e_r-{FX_i}^2)}{2}$, $i\in{0,1,2,{\ldots}n-1}$' with the initial value is '$X_0=\frac{1}{\sqrt{F}}{\pm}e_0$'. The bits to the right of p fractional bits in intermediate multiplication results are truncated and this truncation error is less than '$e_r=2^{-p}$'. The value of p is 28 for the single precision floating point, and 58 for the double precision floating point. Let '$X_i=\frac{1}{\sqrt{F}}{\pm}e_i$, there is '$X_{i+1}=\frac{1}{\sqrt{F}}-e_{i+1}$, where '$e_{i+1}{<}\frac{3{\sqrt{F}}{{e_i}^2}}{2}{\mp}\frac{{Fe_i}^3}{2}+2e_r$'. If '$|\frac{\sqrt{3-e_r-{FX_i}^2}}{2}-1|<2^{\frac{\sqrt{-p}{2}}}$' is true, '$e_{i+1}<8e_r$' is less than the smallest number which is representable by floating point number. So, $X_{i+1}$ is approximate to '$\frac{1}{\sqrt{F}}$. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications Per an operation is derived from many reciprocal square root tables ($X_0=\frac{1}{\sqrt{F}}{\pm}e_0$) with varying sizes. The superiority of this algorithm is proved by comparing this average number with the fixed number of multiplications of the conventional algorithm. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a reciprocal square root unit. Also, it can be used to construct optimized approximate reciprocal square root tables. The results of this paper can be applied to many areas that utilize floating point numbers, such as digital signal processing, computer graphics, multimedia, scientific computing, etc.

The Design of Geometry Processor for 3D Graphics (3차원 그래픽을 위한 Geometry 프로세서의 설계)

  • Jeong, Cheol-Ho;Park, Woo-Chan;Kim, Shin-Dug;Han, Tack-Don
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.1
    • /
    • pp.252-265
    • /
    • 2000
  • In this thesis, the analysis of data processing method and the amount of computation in the whole geometry processing is conducted step by step. Floating-point ALU design is based on the characteristics of geometry processing operation. The performance of the devised ALU fitting with the geometry processing operation is analyzed by simulation after the description of the proposed ALU and geometry processor. The ALU designed in the paper can perform three types of floating-point operation simultaneously-addition/subtraction, multiplication, division. As a result, the 23.5% of improvement is achieved by that floating-point ALU for the whole geometry processing and in the floating-point division and square root operation, there is another 23% of performance gain with adding area-performance efficient SRT divisor.

  • PDF

A Real-Time JPEG2000 Codec Implementation on ARM9 Processor (ARM9 프로세서용 실시간 JPEG2000 코덱의 구현)

  • Kim, Young-Tae;Cho, Shi-Won;Lee, Dong-Wook
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.8 no.3
    • /
    • pp.149-155
    • /
    • 2007
  • In this paper, we propose an real-time implementation of JPEG2000 codec on the ARM9 processor. The implemented codec is designed to separate control codes from data management codes in order to use effectively the system resources such as processor and memory. Especially, in embedded situations like cellular phones it is very important to provide good services using limited processor and internal memory. Since ARM9 series processors do not provide floating-point, large amount of computational time is required to perform the operation which needs highly repetitive floating-point computations like DWT(discrete wavelet transform). The proposed codec was programed using fixed-point to overcome this weakness. Also code optimization considering cache memory was applied to further improve the computational speed.

  • PDF

Optimization of Gaussian Mixture Computation of ASR on DSP 67x (DSP 67x 기반 음성인식 시스템의 가우시안 확률 계산 최적화 구현)

  • Choi Taeil;Kim Taeyun;Ko Hanseok
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.53-56
    • /
    • 2004
  • 본 논문은 HMM 기반 임베디드 음성인식 시스템 구현에 관한 몇 가지 주제들을 설명한다. 임베디드 환경은 한정된 자원을 가지고 있고 그러한 가운데 타당한 인식률과 향상된 인식 속도를 얻기 위해서 몇가지 방법들을 이 논문에서 설명한다. 구현 환경은 DSP6711 기반에서 이루어졌다. 가우시안 mixture 계산 루틴을 부동소수점 연산에서 고정소수점 연산 및 software pipelining을 적용하였다. 고정소수점 변환 전과 후 비슷한 인식률을 얻었고 고정소수점 변환과 software pipelining 적용 후 연산 속도의 향상을 얻었다.

  • PDF

An Implementation of Low Cost 5-stage Powering Unit Using Newton Method (Newton Method을 이용한 저비용 5-stage 멱승기의 구현)

  • Song, Se-Hyun;Kim, Ki-Chul
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.10b
    • /
    • pp.194-197
    • /
    • 2007
  • 본 논문에서는 모바일용 3차원 그래픽 라이팅 엔진을 위한 부동소수점 멱승기클 제안한다. 3D 그래픽의 라이팅 과정은 연산량이 많고, 복잡하기 때문에 각 연산 유닛들이 저비용으로 빠르게 연산을 수행해야 한다. 본 논문에서 제안한 멱승기는 처리율을 높이기 위해 파이프라인 구조를 사용하였으며, $10^{-4}$의 정확도를 만족한다. 전체 구조는 5 stage로 구성되며, 크게 로그연산기와 지수연산기로 이루어져 있다. 일반적으로 로그연산기는 정확도를 높이기 위하여 큰 롬 테이블을 사용하는데, 이는 많은 면적을 차지하게 된다. 이러한 롬 테이블 면적 문제를 해결하기 위하여 Newton method을 사용하여 롬 테이블의 사이즈를 줄였다. 또한 오일러 상수를 밑으로 하는 지수연산기도 입력 비트의 크기를 줄이고, 테이블의 개수를 늘림으로써 롬 테이블의 크기를 줄였다. 지수연산의 밑은 부동소수점 포맷으로 [0, 1]의 범위를 가지며, 승은 정수 포맷으로 [0, 128]의 범위를 갖는다. Magnachip $0.18{\mu}m$ 공정에서 100Mhz의 동작주파수를 만족하였으며, 약 16k gates을 차지한다.

  • PDF

Real-Time Implementation of the EHSX Speech Coder Using a Floating Point DSP (부동 소수점 DSP를 이용한 4kbps EHSX 음성 부호화기의 실시간 구현)

  • 이인성;박동원;김정호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.5
    • /
    • pp.420-427
    • /
    • 2004
  • This paper presents real time implementation of 4kbps EHSX (Enhanced Harmonic Stochastic Excitation) speech coder that combines the harmonic vector excitation coding with time-separated transition coding. The harmonic vector excitation coding uses the harmonic excitation coding for voiced frames and used the vector excitation coding with the structure of analysis-by-synthesis for unvoiced frames, respectively. For transition frames mixed with voiced and unvoiced signal, we use the time-separated transition coding. In this paper. we present the optimization methods of implementation speech coder on the EMS320C6701/sup (R)/ DSP. To reduce the complex for real-time implementation. we perform the optimization method in algorithm by replacing the complex sinusoidal synthesis method with IFFT. and we apply fully pipelines hand assembly coding after converting it from floating source to fixed source. To generate a more efficient code. we also make use or the available EMS320C6701/sup (R)/ resources such as Fastest67x library and memory organization.

Fixed-point Processing Optimization of MPEG Psychoacoustic Model-II Algorithm for ASIC Implementation (MPEG 심리음향 모델-ll 알고리듬의 ASIC 구현을 위한 고정 소수점 연산 최적화)

  • Lee Keun-Sup;Park Young-Cheol;Youn Dae Hee
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.11C
    • /
    • pp.1491-1497
    • /
    • 2004
  • The psychoacoustic model in MPEG audio layer-III (MP3) encoder is optimized for the fixed-point processing. The optimization process consists of determining the data word length of arithmetic unit and the algorithm for transcendental functions that are often used in the psychoacoustic model. In order to determine the data word length, we defined a statistical model expressing the relation between the fixed-point operation errors of the psychoacoustic model and the probability of alteration of the allocated bits doe to these errors. Based on the simulations using this model, we chose a 24-bit data path and constructed a 24-bit fixed-point MP3 encoder. Sound quality tests using the constructed fixed-point encoder showed a mean degradation of -0.2 on ITU-R 5-point audio impairment scale.

FPGA-based Artificial Neural Network Accelerator Optimization Using Approximate Computing (Approximate computing 기법을 이용한 FPGA 기반 인공 신경망 가속기 최적화)

  • Park, Sangwoo;Kim, Hanyee;Suh, Taeweon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.479-481
    • /
    • 2019
  • 본 연구에서는 이미지를 분류하는 인공 신경망 가속기를 최적화했고, 이를 구현하여 기존 인공 신경망 가속기와 성능을 비교 분석했다. FPGA(Field Programmable Fate Array) 보드를 이용하여 가속기를 구현했으며, 해당 보드의 내부 메모리인 BRAM 을 FIFO(First In First Out)구조로 설계하여 메모리 시스템을 구현했다. Approximate computing 기법을 효율적으로 적용하기 위해 FWL(Fractional Word Length)최적점을 분석했고, 이를 기반으로 인공 신경망 가속기의 부동 소수점 연산을 고정 소수점 연산으로 변환했다. 구현된 인공 신경망 가속기는 기존의 인공 신경망에 비해, 약 7.4%더 효율적인 전력소모량을 보였다.

A Design of Floating-Point Geometry Processor for Embedded 3D Graphics Acceleration (내장형 3D 그래픽 가속을 위한 부동소수점 Geometry 프로세서 설계)

  • Nam Ki hun;Ha Jin Seok;Kwak Jae Chang;Lee Kwang Youb
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.2 s.344
    • /
    • pp.24-33
    • /
    • 2006
  • The effective geometry processing IP architecture for mobile SoC that has real time 3D graphics acceleration performance in mobile information system is proposed. Base on the proposed IP architecture, we design the floating point arithmetic unit needed in geometry process and the floating point geometry processor supporting the 3D graphic international standard OpenGL-ES. The geometry processor is implemented by 160k gate area in a Xilinx-Vertex FPGA and we measure the performance of geometry processor using the actual 3D graphic data at 80MHz frequency environment The experiment result shows 1.5M polygons/sec processing performance. The power consumption is measured to 83.6mW at Hynix 0.25um CMOS@50MHz.

An exact floating point square root calculator using multiplier (곱셈기를 이용한 정확한 부동소수점 제곱근 계산기)

  • Cho, Gyeong-Yeon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.8
    • /
    • pp.1593-1600
    • /
    • 2009
  • There are two major algorithms to find a square root of floating point number, one is the Newton_Raphson algorithm and GoldSchmidt algorithm which calculate it approximately by iterating multiplications and the other is SRT algorithm which calculates it exactly by iterating subtractions. This paper proposes an exact floating point square root algorithm using only multiplication. At first an approximate inverse square root is calculated by Newton_Raphson algorithm, and then an exact square root algorithm by reducing an error in it and a compensation algorithm of it are proposed. The proposed algorithm is verified to calculate all of numbers in a single precision floating point number and 1 billion random numbers in a double precision floating point number. The proposed algorithm requires only the multipliers without another hardware, so it can be widely used in an embedded system and mobile production which requires an efact square root of floating point number.