Search | Korea Science

A design of Floating Point Arithmetic Unit for Geometry Operation of Mobile 3D Graphic Processor (모바일 3D 그래픽 프로세서의 지오메트리 연산을 위한 부동 소수점 연산기 구현)

Lee, Jee-Myong;Lee, Chan-Ho
- Proceedings of the IEEK Conference
- /
- 2005.11a
- /
- pp.711-714
- /
- 2005
We propose floating point arithmetic units for geometry operation of mobile 3D graphic processor. The proposed arithmetic units conform to the single precision format of IEEE standard 754-1985 that is a standard of floating point arithmetic. The rounding algorithm applies the nearest toward zero form. The proposed adder/subtraction unit and multiplier have one clock cycle latency, and the inversion unit has three clock cycle latency. We estimate the required numbers of arithmetic operation for Viewing transformation. The first stage of geometry operation is composed with translation, rotation and scaling operation. The translation operation requires three addition and the rotation operation needs three addition and six multiplication. The scaling operation requires three multiplication. The viewing transformation is performed in 15 clock cycles. If the adder and the multiplier have their own in/out ports, the viewing transformation can be done in 9 clock cycles. The error margin of proposed arithmetic units is smaller than $10^{-5}$ that is the request in the OpenGL standard. The proposed arithmetic units carry out operations in 100MHz clock frequency.
PDF

An Improved Newton-Raphson's Reciprocal and Inverse Square Root Algorithm (개선된 뉴톤-랍손 역수 및 역제곱근 알고리즘)

Cho, Gyeong-Yeon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.11 no.1
- /
- pp.46-55
- /
- 2007
The Newton-Raphson's algorithm for finding a floating point reciprocal and inverse square root calculates the result by performing a fixed number of multiplications. In this paper, an improved Newton-Raphson's algorithm is proposed, that performs multiplications a variable number. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications per an operation is derived from many reciprocal and inverse square tables with varying sizes. The superiority of this algorithm is proved by comparing this average number with the fixed number of multiplications of the conventional algorithm. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a reciprocal and inverse square root unit. Also, it can be used to construct optimized approximate tables. The results of this paper can be applied to many areas that utilize floating point numbers, such as digital signal processing, computer graphics, multimedia, scientific computing, etc.
https://doi.org/10.6109/JKIICE.2007.11.1.46 인용 PDF KSCI

Logic circuit design for high-speed computing of dynamic response in real-time hybrid simulation using FPGA-based system

Igarashi, Akira
- Smart Structures and Systems
- /
- v.14 no.6
- /
- pp.1131-1150
- /
- 2014
One of the issues in extending the range of applicable problems of real-time hybrid simulation is the computation speed of the simulator when large-scale computational models with a large number of DOF are used. In this study, functionality of real-time dynamic simulation of MDOF systems is achieved by creating a logic circuit that performs the step-by-step numerical time integration of the equations of motion of the system. The designed logic circuit can be implemented to an FPGA-based system; FPGA (Field Programmable Gate Array) allows large-scale parallel computing by implementing a number of arithmetic operators within the device. The operator splitting method is used as the numerical time integration scheme. The logic circuit consists of blocks of circuits that perform numerical arithmetic operations that appear in the integration scheme, including addition and multiplication of floating-point numbers, registers to store the intermediate data, and data busses connecting these elements to transmit various information including the floating-point numerical data among them. Case study on several types of linear and nonlinear MDOF system models shows that use of resource sharing in logic synthesis is crucial for effective application of FPGA to real-time dynamic simulation of structural response with time step interval of 1 ms.
https://doi.org/10.12989/sss.2014.14.6.1131 인용 KSCI

A Parallel-Architecture Processor Design for the Fast Multiplication of Homogeneous Transformation Matrices (Homogeneous Transformation Matrix의 곱셈을 위한 병렬구조 프로세서의 설계)

Kwon Do-All;Chung Tae-Sang
- The Transactions of the Korean Institute of Electrical Engineers D
- /
- v.54 no.12
- /
- pp.723-731
- /
- 2005
The $4{\times}4$ homogeneous transformation matrix is a compact representation of orientation and position of an object in robotics and computer graphics. A coordinate transformation is accomplished through the successive multiplications of homogeneous matrices, each of which represents the orientation and position of each corresponding link. Thus, for real time control applications in robotics or animation in computer graphics, the fast multiplication of homogeneous matrices is quite demanding. In this paper, a parallel-architecture vector processor is designed for this purpose. The processor has several key features. For the accuracy of computation for real application, the operands of the processors are floating point numbers based on the IEEE Standard 754. For the parallelism and reduction of hardware redundancy, the processor takes column vectors of homogeneous matrices as multiplication unit. To further improve the throughput, the processor structure and its control is based on a pipe-lined structure. Since the designed processor can be used as a special purpose coprocessor in robotics and computer graphics, additionally to special matrix/matrix or matrix/vector multiplication, several other useful instructions for various transformation algorithms are included for wide application of the new design. The suggested instruction set will serve as standard in future processor design for Robotics and Computer Graphics. The design is verified using FPGA implementation. Also a comparative performance improvement of the proposed design is studied compared to a uni-processor approach for possibilities of its real time application.
PDF KSCI

A Design and Fabrication of the High-Speed Division/square-Root using a Redundant Floating Point Binary Number (고속 여분 부동 소수점 이진수의 제산/스퀘어-루트 설계 및 제작)

김종섭;이종화;조상복
- Proceedings of the IEEK Conference
- /
- 2001.06b
- /
- pp.365-368
- /
- 2001
This paper described a design and implementation of the division/square-root for a redundant floating point binary number using high-speed quotient selector. This division/square-root used the method of a redundant binary addition with 25MHz clock speed. The addition of two numbers can be performed in a constant time independent of the word length since carry propagation can be eliminated. We have developed a 16-bit VLSI circuit for division and square-root operations used extensively in each iterative step. It peformed the division and square-root by a redundant binary addition to the shifted binary number every 16 cycles. Also the circuit uses the nonrestoring method to obtain a quotient. The quotient selection logic used a leading three digits of partial remainders in order to be implemented in a simple circuit. As a result, the performance of the proposed scheme is further enhanced in the speed of operation process by applying new quotient selection addition logic which can be parallelly process the quotient decision field. It showed the speed-up of 13% faster than previously presented schemes used the same algorithms.
PDF

A Variable Latency Goldschmidt's Floating Point Number Square Root Computation (가변 시간 골드스미트 부동소수점 제곱근 계산기)

Kim, Sung-Gi;Song, Hong-Bok;Cho, Gyeong-Yeon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.9 no.1
- /
- pp.188-198
- /
- 2005
The Goldschmidt iterative algorithm for finding a floating point square root calculated it by performing a fixed number of multiplications. In this paper, a variable latency Goldschmidt's square root algorithm is proposed, that performs multiplications a variable number of times until the error becomes smaller than a given value. To find the square root of a floating point number F, the algorithm repeats the following operations: $R_i=\frac{3-e_r-X_i}{2},\;X_{i+1}=X_i{\times}R^2_i,\;Y_{i+1}=Y_i{\times}R_i,\;i{\in}\{{0,1,2,{\ldots},n-1} }}'$with the initial value is $'\;X_0=Y_0=T^2{\times}F,\;T=\frac{1}{\sqrt {F}}+e_t\;'$. The bits to the right of p fractional bits in intermediate multiplication results are truncated, and this truncation error is less than $'e_r=2^{-p}'$. The value of p is 28 for the single precision floating point, and 58 for the doubel precision floating point. Let $'X_i=1{\pm}e_i'$, there is $'\;X_{i+1}=1-e_{i+1},\;where\;'\;e_{i+1}<\frac{3e^2_i}{4}{\mp}\frac{e^3_i}{4}+4e_{r}'$. If '|X_i-1|<2^{\frac{-p+2}{2}}\;'$ is true, $'\;e_{i+1}<8e_r\;'$ is less than the smallest number which is representable by floating point number. So, $\sqrt{F}$ is approximate to $'\;\frac{Y_{i+1}}{T}\;'$. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications per an operation is derived from many reciprocal square root tables ($T=\frac{1}{\sqrt{F}}+e_i$) with varying sizes. The superiority of this algorithm is proved by comparing this average number with the fixed number of multiplications of the conventional algorithm. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a square root unit. Also, it can be used to construct optimized approximate reciprocal square root tables. The results of this paper can be applied to many areas that utilize floating point numbers, such as digital signal processing, computer graphics, multimedia, scientific computing, etc.
PDF KSCI

A Variable Latency Goldschmidt's Floating Point Number Divider (가변 시간 골드스미트 부동소수점 나눗셈기)

Kim Sung-Gi;Song Hong-Bok;Cho Gyeong-Yeon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.9 no.2
- /
- pp.380-389
- /
- 2005
The Goldschmidt iterative algorithm for a floating point divide calculates it by performing a fixed number of multiplications. In this paper, a variable latency Goldschmidt's divide algorithm is proposed, that performs multiplications a variable number of times until the error becomes smaller than a given value. To calculate a floating point divide '$\frac{N}{F}$', multifly '$T=\frac{1}{F}+e_t$' to the denominator and the nominator, then it becomes ’$\frac{TN}{TF}=\frac{N_0}{F_0}$'. And the algorithm repeats the following operations: ’$R_i=(2-e_r-F_i),\;N_{i+1}=N_i{\ast}R_i,\;F_{i+1}=F_i{\ast}R_i$, i$\in${0,1,...n-1}'. The bits to the right of p fractional bits in intermediate multiplication results are truncated, and this truncation error is less than ‘$e_r=2^{-p}$'. The value of p is 29 for the single precision floating point, and 59 for the double precision floating point. Let ’$F_i=1+e_i$', there is $F_{i+1}=1-e_{i+1},\;e_{i+1}',\;where\;e_{i+1}, If '$[F_i-1]<2^{\frac{-p+3}{2}}$ is true, ’$e_{i+1}<16e_r$' is less than the smallest number which is representable by floating point number. So, ‘$N_{i+1}$ is approximate to ‘$\frac{N}{F}$'. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications per an operation is derived from many reciprocal tables ($T=\frac{1}{F}+e_t$) with varying sizes. 1'he superiority of this algorithm is proved by comparing this average number with the fixed number of multiplications of the conventional algorithm. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a divider. Also, it can be used to construct optimized approximate reciprocal tables. The results of this paper can be applied to many areas that utilize floating point numbers, such as digital signal processing, computer graphics, multimedia, scientific computing, etc
PDF KSCI

A Variable Latency Newton-Raphson's Floating Point Number Reciprocal Square Root Computation (가변 시간 뉴톤-랍손 부동소수점 역수 제곱근 계산기)

Kim Sung-Gi;Cho Gyeong-Yeon
- The KIPS Transactions:PartA
- /
- v.12A no.5 s.95
- /
- pp.413-420
- /
- 2005
The Newton-Raphson iterative algorithm for finding a floating point reciprocal square mot calculates it by performing a fixed number of multiplications. In this paper, a variable latency Newton-Raphson's reciprocal square root algorithm is proposed that performs multiplications a variable number of times until the error becomes smaller than a given value. To find the rediprocal square root of a floating point number F, the algorithm repeats the following operations: '$X_{i+1}=\frac{{X_i}(3-e_r-{FX_i}^2)}{2}$, $i\in{0,1,2,{\ldots}n-1}$' with the initial value is '$X_0=\frac{1}{\sqrt{F}}{\pm}e_0$'. The bits to the right of p fractional bits in intermediate multiplication results are truncated and this truncation error is less than '$e_r=2^{-p}$'. The value of p is 28 for the single precision floating point, and 58 for the double precision floating point. Let '$X_i=\frac{1}{\sqrt{F}}{\pm}e_i$, there is '$X_{i+1}=\frac{1}{\sqrt{F}}-e_{i+1}$, where '$e_{i+1}{<}\frac{3{\sqrt{F}}{{e_i}^2}}{2}{\mp}\frac{{Fe_i}^3}{2}+2e_r$'. If '$|\frac{\sqrt{3-e_r-{FX_i}^2}}{2}-1|<2^{\frac{\sqrt{-p}{2}}}$' is true, '$e_{i+1}<8e_r$' is less than the smallest number which is representable by floating point number. So, $X_{i+1}$ is approximate to '$\frac{1}{\sqrt{F}}$. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications Per an operation is derived from many reciprocal square root tables ($X_0=\frac{1}{\sqrt{F}}{\pm}e_0$) with varying sizes. The superiority of this algorithm is proved by comparing this average number with the fixed number of multiplications of the conventional algorithm. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a reciprocal square root unit. Also, it can be used to construct optimized approximate reciprocal square root tables. The results of this paper can be applied to many areas that utilize floating point numbers, such as digital signal processing, computer graphics, multimedia, scientific computing, etc.
https://doi.org/10.3745/KIPSTA.2005.12A.5.413 인용 PDF KSCI

A Variable Latency Newton-Raphson's Floating Point Number Reciprocal Computation (가변 시간 뉴톤-랍손 부동소수점 역수 계산기)

Kim Sung-Gi;Cho Gyeong-Yeon
- The KIPS Transactions:PartA
- /
- v.12A no.2 s.92
- /
- pp.95-102
- /
- 2005
The Newton-Raphson iterative algorithm for finding a floating point reciprocal which is widely used for a floating point division, calculates the reciprocal by performing a fixed number of multiplications. In this paper, a variable latency Newton-Raphson's reciprocal algorithm is proposed that performs multiplications a variable number of times until the error becomes smaller than a given value. To find the reciprocal of a floating point number F, the algorithm repeats the following operations: '$'X_{i+1}=X=X_i*(2-e_r-F*X_i),\;i\in\{0,\;1,\;2,...n-1\}'$ with the initial value $'X_0=\frac{1}{F}{\pm}e_0'$. The bits to the right of p fractional bits in intermediate multiplication results are truncated, and this truncation error is less than $'e_r=2^{-p}'$. The value of p is 27 for the single precision floating point, and 57 for the double precision floating point. Let $'X_i=\frac{1}{F}+e_i{'}$, these is $'X_{i+1}=\frac{1}{F}-e_{i+1},\;where\;{'}e_{i+1}, is less than the smallest number which is representable by floating point number. So, $X_{i+1}$ is approximate to $'\frac{1}{F}{'}$. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications per an operation is derived from many reciprocal tables $(X_0=\frac{1}{F}{\pm}e_0)$ with varying sizes. The superiority of this algorithm is proved by comparing this average number with the fixed number of multiplications of the conventional algorithm. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a reciprocal unit. Also, it can be used to construct optimized approximate reciprocal tables. The results of this paper can be applied to many areas that utilize floating point numbers, such as digital signal processing, computer graphics, multimedia scientific computing, etc.
https://doi.org/10.3745/KIPSTA.2005.12A.2.095 인용 PDF KSCI

The design of 3D graphics rendering processor for portable device (휴대용기기에 적합한 3차원 그래픽 렌더링 처리기의 파이프라인 설계)

우현재;정종철;이문기
- Proceedings of the IEEK Conference
- /
- 2003.07b
- /
- pp.1213-1216
- /
- 2003
This paper proposes an 3D graphics rendering processor for portable device. One the most important factor is chip size for portable device, but the conventional 3D graphics rendering processor is not a suitable because the processor needs a lot of multiplication and division units. So the proposed architecture substitutes single precision floating point by 32 bit fixed point, and uses recursive units for the same operation such as color values(z, r, g, b, a) and texture values (s, t, u, v). In this approach, we reduce numbers of multiplications and divisions by 66.1% and 75% respectively at the sacrifice of performance degradation by 2.12%.
PDF

Search Result 30, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)