Search | Korea Science

Design of Square Root and Inverse Square Root Arithmetic Units for Mobile 3D Graphic Processing (모바일 3차원 그래픽 연산을 위한 제곱근 및 역제곱근 연산기 구조 및 설계)

Lee, Chan-Ho
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.46 no.3
- /
- pp.20-25
- /
- 2009
We propose hardware architecture of floating-point square root and inverse square root arithmetic units using lookup tables. They are used for lighting engines and shader processor for 3D graphic processing. The architecture is based on Taylor series expansion and consists of lookup tables and correction units so that the size of look-up tables are reduced. It can be applied to 32 bit floating point formats of IEEE-754 and reduced 24 bit floating point formats. The square root and inverse square root arithmetic units for 32 bit and 24 bit floating format number are designed as the proposed architecture. They can operation in a single cycle, and satisfy the precision of $10^{-5}$ required by OpenGL 1.x ES. They are designed using Verilog-HDL and the RTL codes are verified using an FPGA.
PDF KSCI

A Study on Cycle Based Simulator of a 32 bit floating point DSP (32비트 부동소수점 DSP의 Cycle Based Simulator에 관한 연구)

우종식;양해용;안철홍;박주성
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.35C no.11
- /
- pp.31-38
- /
- 1998
This paper deals with CBS(Cycle Base Simulator) design of a 32 bit floating point DSP(Digital Signal Processor). The CBS has been developed for TMS320C30 compatible DSP and will be used to confirm the architecture, functions of sub-blocks, and control signals of the chip before the detailed logic design starts with VHDL. The outputs from CBS are used as important references at gate level design step because they give us control signals, output values of important blocks, values from internal buses and registers at each pipeline step, which are not available from the commercial simulator of DSP. In addition to core functions, it has various interfaces for efficient execution and convenient result display, CBS is verified through comparison with results from the commercial simulator for many application algorithms and its simulation speed is as fast as several tenth of that of logic simulation with VHDL. CBS in this work is for a specific DSP, but the concept may be applicable to other VLSI design.
PDF

Design of Transformation Engine for Mobile 3D Graphics (모바일 3차원 그래픽을 위한 기하변환 엔진 설계)

Kim, Dae-Kyoung;Lee, Jee-Myong;Lee, Chan-Ho
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.44 no.10
- /
- pp.49-54
- /
- 2007
As digital contents based on 3D graphics are increased, the requirement for low power 3D graphic hardware for mobile devices is increased. We design a transformation engine for mobile 3D graphic processor. We propose a simplified transformation engine for mobile 3D graphic processor. The area of the transformation engine is reduced by merging a mapping transformation unit into a projective transformation unit and by replacing a clipping unit with a selection unit. It consists of a viewing transformation unit a projective transformation unit a divide by w nit, and a selection unit. It can process 32 bit floating point format of the IEEE-754 standard or a reduced 24 bit floating point format. It has a pipelined architecture so that a vertex is processed every 4 cycles except for the initial latency. The RTL code is verified using an FPGA.
PDF KSCI

Design of a Truncated Floating-Point Multiplier for Graphic Accelerator of Mobile Devices (모바일 그래픽 가속기용 부동소수점 절사 승산기 설계)

Cho, Young-Sung;Lee, Yong-Hwan
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.11 no.3
- /
- pp.563-569
- /
- 2007
As the mobile communication and the semiconductor technology is improved continuously, mobile contents such as the multimedia service and the 2D/3D graphics which require high level graphics are serviced recently. Mobile chips should consume small die area and low power. In this paper, we design a truncated floating-point multiplier that is useful for the 2D/3D vector graphics in mobile devices. The truncated multiplier is based on the radix-4 Booth's encoding algorithm and a truncation algorithm is used to achieve small area and low power. The average percent error of the multiplier is as small as 0.00003% and neglectable for mobile applications. The synthesis result using 0.35um CMOS cell library shows that the number of gates for the truncated multiplier is only 33.8 percent of the conventional radix-4 Booth's multiplier.
https://doi.org/10.6109/jkiice.2007.11.3.563 인용 PDF KSCI

Sphere Decoding Algorithm for MIMO System (MIMO 시스템을 위한 Sphere Decoding 알고리즘)

An, Jin-Young;Park, Hee-Jun;Kim, Sang-Choon
- Proceedings of the KIEE Conference
- /
- 2008.10b
- /
- pp.115-116
- /
- 2008
본 논문에서는 다중입력 다중출력(Multiple Input Multiple Output: MIMO) 시스템에서 Maximum Likelihood (ML) 수신기와 같은 성능을 가지지만 복잡도가 낮은 Sphere Decoding (SD) 알고리즘에 대해 분석하고, 그 성능을 평가한다. 각각의 송신 안테나에서 채널로 전송되는 독립적인 신호는 QPSK 방식을 사용하여 변조되며, 채널은 산란이 활발하게 일어나는 레일리(Rayleigh) 평탄 페이딩 채널로 가정한다. 수신기에서 수신된 신호는 Fincke & Pohst SD 알고리즘에 의해 간 송신 안테나로부터의 독립적인 신호로 검파되며 그 성능이 ML수신기의 성능과 비교되었다. 추가적으로 복잡도를 줄이기 위해 개선된 형태인 Viterbo & Boutros SD 알고리즘을 이용하여 검파된 신호의 BER 성능과 부동 소수점 연산량 (Floating Point Operations: FLOPS)이 각각 비교 분석되었다.
PDF

Development of MPC555-based Controller for Generator Control of HEV (하이브리드 전기자동차 발전기 제어용 MPC555 보드 개발)

Kwak, Mu-Shin;Son, Yo-Chan;Sul, Seung-Ki
- Proceedings of the KIEE Conference
- /
- 2001.07b
- /
- pp.1185-1187
- /
- 2001
본 논문에서는 Motorola사의 MPC555마이크로프로세서를 탑재하여 개발한 제어보드가 소개된다. 이 보드를 사용하여 하이브리드 전기자동차의 발전기를 제어하는 실험을 수행하였다. MFC555는 전력시스템 제어에 필요한 다양한 입출력 장치를 내장하고 있어서 통합제어를 위한 one-chip solution을 가능하게 해 준다. MPC555는 내부 플래시 메모리가 비교적 대용량(448kbytes)이고 부동 소수점 연산이 가능하다. 또한 A/D 채널이 32개이고 SPI(Serial Peripheral Interface) 모듈 1개, SCI(Serial Communication Interface) 모듈 2개, CAN(Contol Area Network) 모듈 2개 등의 다양한 통신채널을 내장하고 있다. MPC555는 TPU(Time Processing Unit) 채널 32개로 다양한 timing function을 구현할 수 있게 해 준다. 개발된 제어 보드를 이용하여 하이브리드 전기자동차의 유도발전기 시스템에 대한 축소 시뮬레이션을 수행하였다.
PDF

Design of a Floating Point Multiplier for IEEE 754 Single-Precision Operations (IEEE 754 단정도 부동 소수점 연산용 곱셈기 설계)

Lee, Ju-Hun;Chung, Tae-Sang
- Proceedings of the KIEE Conference
- /
- 1999.11c
- /
- pp.778-780
- /
- 1999
Arithmetic unit speed depends strongly on the algorithms employed to realize the basic arithmetic operations.(add, subtract multiply, and divide) and on the logic design. Recent advances in VLSI have increased the feasibility of hardware implementation of floating point arithmetic units and microprocessors require a powerful floating-point processing unit as a standard option. This paper describes the design of floating-point multiplier for IEEE 754-1985 Single-Precision operation. Booth encoding algorithm method to reduce partial products and a Wallace tree of 4-2 CSA is adopted in fraction multiplication part to generate the $32{\times}32$ single-precision product. New scheme of rounding and sticky-bit generation is adopted to reduce area and timing. Also there is a true sign generator in this design. This multiplier have been implemented in a ALTERA FLEX EPF10K70RC240-4.
PDF

A design of floating-point multiplier for superscalar microprocessor (수퍼스칼라 마이크로프로세서용 부동 소수점 승산기의 설계)

최병윤;이문기
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.5
- /
- pp.1332-1344
- /
- 1996
This paper presents a pipelined floating point multiplier(FMUL) for superscalar microprocessors that conbines radix-16 recoding scheme based on signed-digit(SD) number system and new rouding and normalization scheme. The new rounding and normalization scheme enable the FMUL to compute sticky bit in parallel with multiple operation and elminate timing delay due to post-normalization. By expoliting SD radix-16 recoding scheme, we can achieves further reduction of silicon area and computation time. The FMUL can execute signle-precision or double-precision floating-point multiply operation through three-stage pipelined datapath and support IEEE standard 754. The algorithm andstructure of the designed multiplier have been successfully verified through Verilog HOL modeling and simulation.
PDF

A design of floating-point arithmetic unit for superscalar microprocessor (수퍼스칼라 마이크로프로세서용 부동 소수점 연산회로의 설계)

최병윤;손승일;이문기
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.5
- /
- pp.1345-1359
- /
- 1996
This paper presents a floating point arithmetic unit (FPAU) for supescalar microprocessor that executes fifteen operations such as addition, subtraction, data format converting, and compare operation using two pipelined arithmetic paths and new rounding and normalization scheme. By using two pipelined arithmetic paths, each aritchmetic operation can be assigned into appropriate arithmetic path which high speed operation is possible. The proposed normalization an rouding scheme enables the FPAU to execute roundig operation in parallel with normalization and to reduce timing delay of post-normalization. And by predicting leading one position of results using input operands, leading one detection(LOD) operation to normalize results in the conventional arithmetic unit can be eliminated. Because the FPAU can execuate fifteen single-precision or double-precision floating-point arithmetic operations through three-stage pipelined datapath and support IEEE standard 754, it has appropriate structure which can be ingegrated into superscalar microprocessor.
PDF

Design of a Floating-Point Divider for IEEE 754-1985 Single-Precision Operations (IEEE 754-1985 단정도 부동 소수점 연산용 나눗셈기 설계)

Park, Ann-Soo;Chung, Tea-Sang
- Proceedings of the KIEE Conference
- /
- 2001.11c
- /
- pp.165-168
- /
- 2001
This paper presents a design of a divide unit supporting IEEE-754 floating point standard single-precision with 32-bit word length. Its functions have been verified with ALTERA MAX PLUS II tool. For a high-speed division operation, the radix-4 non-restoring algorithm has been applied and CLA(carry-look -ahead) adders has been used in order to improve the area efficiency and the speed of performance for the fraction division part. The prevention of the speed decrement of operations due to clocking has been achieved by taking advantage of combinational logic. A quotient select block which is very complicated and significant in the high-radix part was designed by using P-D plot in order to select the fast and accurate quotient. Also, we designed all division steps with Gate-level which visualize the operations and delay time.
PDF

Search Result 188, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)