• Title/Summary/Keyword: graphics hardware

Search Result 198, Processing Time 0.03 seconds

A design of a floating point unit with 3 stages for a 3D graphics shader engine

  • Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.11 no.4
    • /
    • pp.358-363
    • /
    • 2007
  • This paper presents a floating point unit(FPU) with 3 stages for a 3D graphics shader engine. It targeted to accelerate 3D graphics in portable device environments. In order to design a balanced architecture for a shader engine, we analyzed shader assembly instructions and estimated the performance of FPU with the method we propose. The proposed unit handles 4-dimensional data through separated two paths that are lead to general operation module and special function module. The proposed FPU is compiled as a form of the cascade FPU with 3 stages to efficiently handle a matrix operation with relatively low hardware overhead. Except some complex instructions that are executed using macro instructions, all instructions complete an operation in a single instruction cycle at 100MHz frequency. A special function module performs all operations in a single clock cycle using the Newton Raphson method with the look-up table.

  • PDF

Numerical Computing on Graphics Hardware

  • 임인성
    • 한국가시화정보학회:학술대회논문집
    • /
    • 2004.04a
    • /
    • pp.57-63
    • /
    • 2004
  • 최근 일반 범용 PC 에 장착되고 있는 ATI 나 NVIDIA 등의 그래픽스 가속기의 성능은 수년전과 비교할 때 비교가 안 될 정도의 빠른 속도를 자랑하고 있다. 이러한 속도 향상과 함께 급격하게 일어나고 있는 변화 중의 하나는 바로 기존의 고정된 기능의 그래픽스 파이프라인(fixed-function graphics pipeline)과는 달리 프로그래머가 가속기의 기능을 자유자재로 프로그래밍할 수 있도록 해주는 프로그래밍이 가능한 파이프라인(programmable graphics pipeline)의 출현이라 할 수 있다. 이러한 가속기에 장착되고 있는 GPU (Graphics Processing Unit)는 간단한 형태의 SIMD 프로세서라 할 수 있는데, 특히 GPU 의 한 부분인 픽셀 쉐이더는 그 처리 속도가 매우 높기 때문에 이를 통하여 기존의 수치 알고리즘을 병렬화 하려는 시도가 활발히 일어나고 있다. 본 강연에서는 다양한 수치 계산을 그래픽스 가속기를 사용하여 해결하려는 시도에 대하여 간단히 살펴본다.

  • PDF

A Study on the Implementation of Low Power DCT Architecture for MPEG-4 AVC (저전력 DCT를 이용한 MPEG-4 AVC 압축에 관한 연구)

  • Kim, Dong-Hoon;Seo, Sang-Jin;Park, Sang-Bong;Jin, Hyun-Joon;Park, Nho-Kyung
    • Proceedings of the KIEE Conference
    • /
    • 2007.10a
    • /
    • pp.371-372
    • /
    • 2007
  • In this paper we present performance and implementation comparisons of high performance two dimensional forward and inverse Discrete Cosine Transform (2D-DCT/IDCT) algorithm and low power algorithm for $8{\times}8$ 20 DCT and quantization based on partial sum and its corresponding hardware architecture for FPGA in MPEG-4. The architecture used in both low power 20 DCT and 2D IDCT is based on the conventional row-column decomposition method. The use of Fast algorithm and distributed arithmetic(DA) technique to implement the DCT/IDCT reduces the hardware complexity. The design was made using Mentor Graphics Tools for design entry and implementation. Mentor Graphics ModelSim SE6.1f was used for Verilog HDL entry, behavioral Simulation and Synthesis. The 2D DCT/IDCT consumes only 50% of the Operating Power.

  • PDF

Implementation of IQ/IDCT in H.264/AVC Decoder Using GP-GPU (GP-GPU를 이용한 H.264/AVC 디코더의 IQ/IDCT구현)

  • Jeong, Jun-Mo;Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.14 no.2
    • /
    • pp.76-81
    • /
    • 2010
  • The need for dedicated hardware continue to decrease as the mobile CPU's performance increases. But, there is a limit to a mobile CPU's performance. GP-GPU(General-Purpose computing on Graphics Processing Units) can improve performance without adding other dedicated hardware. This paper presents the implementation of Inverse Quantization, Inverse DCT and Color Space Conversion module in H.264/AVC decoder using GP-GPU for a mobile environments. The proposed architecture improves approximately 40% of performance when it use all the features.

Application of variable indexed colors for game development of portable (hand-held) devices (가변 인덱스 컬러를 이용한 뉴 미디어 기기용 게임 제작 방법)

  • Jung, Jong-Pil;Kim, Chee-Hoon
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2008.05a
    • /
    • pp.131-134
    • /
    • 2008
  • Most games based on PCs or consoles at present show vivid special effects and brilliance sceneries. However, the other games running on mobiles and portable devices can not show magnificent scenes because of low hardware specifications such as slow CPU, old graphics card and battery capacity. These games relatively prefer light and casual contents that do not need tremendous calculation. It is very important to keep minimum of game graphics quality in those games. Thus this research presents that the new possibility of variable indexed color palettes to overcome the low hardware capacities.

  • PDF

A Realtime Hardware Design for Face Detection (얼굴인식을 위한 실시간 하드웨어 설계)

  • Suh, Ki-Bum;Cha, Sun-Tae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.2
    • /
    • pp.397-404
    • /
    • 2013
  • This paper propose the hardware architecture of face detection hardware system using the AdaBoost algorithm. The proposed structure of face detection hardware system is possible to work in 30frame per second and in real time. And the AdaBoost algorithm is adopted to learn and generate the characteristics of the face data by Matlab, and finally detected the face using this data. This paper describes the face detection hardware structure composed of image scaler, integral image extraction, face comparing, memory interface, data grouper and detected result display. The proposed circuit is so designed to process one point in one cycle that the prosed design can process full HD($1920{\times}1080$) image at 70MHz, which is approximate $2316087{\times}30$ cycle. Furthermore, This paper use the reducing the word length by Overflow to reduce memory size. and the proposed structure for face detection has been designed using Verilog HDL and modified in Mentor Graphics Modelsim. The proposed structure has been work on 45MHz operating frequency and use 74,757 LUT in FPGA Xilinx Virtex-5 XC5LX330.

A Soft Shadow Technique for a Real-time Mobile Ray Tracing Hardware (실시간 모바일 레이트레이싱 하드웨어를 위한 소프트 쉐도우 생성 기법)

  • Kwon, Hyuck-Joo;Hong, Dukki;Park, Woo-Chan;Lee, Sanghoon
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.3
    • /
    • pp.55-64
    • /
    • 2017
  • In this paper, a novel soft shadow method is suggested to support realistic shadows in mobile ray tracing. In ray tracing, soft shadow is generally generated by sampling a shadow ray. As this sampling method increases the number of rays to be processed, it has undermined the performance. We designed the proposed soft shadow processing method and hardware architecture to overcome this problem through selective shadow generation and triangle address caching for minimizing the performance degradation caused by sampling. The proposed hardware architecture can be integrated into a mobile ray-tracing hardware and was evaluated in terms of its performance on the FPGA. Based on the results, the rendering performance about 4, 8, and 16 samples were improved, respectively, by 40%, 50%, and 56% on average compared to the previous method, and it was found that the real-time soft shadow processing is feasible with the proposed hardware architecture.

Design and Implementation of a 3D Graphic Acceleration Device Driver for Embedded Systems (임베디드 시스템을 위한 3차원 그래픽 가속 장치 구동기의 설계 및 구현)

  • Kim, Seong-Woo;Lee, Jung-Hwa;Lee, Jong-Min
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.9
    • /
    • pp.1209-1219
    • /
    • 2007
  • It is difficult to run 3D graphics based application on the embedded system with hardware constraints. Therefore, such a system must have a systematic infrastructure which can process various operations with respect to 3D graphics through any graphic acceleration module. In this paper, we present a method to implement 3D graphics acceleration device driver on Tiny X platform which provide an open source graphics windowing environment. The proposed method is to initialize the driver step by step so that the direct rendering infrastructure can use it properly. Moreover, we evaluated overall 3D graphics performance of an implemented driver through a simple but effective benchmark program.

  • PDF

Implementation of a 3D Graphics Simulator for GP-GPU (GP-GPU 개발을 위한 3차원 그래픽 시뮬레이터 구현)

  • Yeo, Dong-young;Kim, Woo-young;Jung, Hyung-Ki;Lee, Kwang-Yeob
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.337-340
    • /
    • 2009
  • Since a hardware accelerator for 3D graphics processing GPU(Graphics Processing Unit)'s performance has been improving constantly. This is the efficient way was introduced for complex graphics application, but it is rarely used to utilize 100% resources on GPU. GP-GPU(general-purpose GPU), including operations on the GPU and supporting common operations can be handled by the processor, is noted by depending on the distribution of resources that can be effectively controlled. In this paper, the simulator was implemented that supports virtual environment of GP-GPU and available for program design and debugging. Through this, the co-design development environment support simultaneous design fast and reliable verification that are available to build the interface of three-dimensional graphics display.

  • PDF

Cache simulation for measuring cache performance suitable for sound rendering (사운드 렌더링에 적합한 캐시 성능 측정을 위한 캐시 시뮬레이션)

  • Joo, Yejong;Hong, Dukki;Chung, Woonam;Park, Woo-Chan
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.3
    • /
    • pp.123-133
    • /
    • 2017
  • Cache performance is an important factor in hardware system. We proceed with a cache simulation to analyze the cache performance suitable for sound rendering. In addition, we introduce hardware models based on ray tracing used in geometric method and studies to improve cache performance. Cache simulation is performed on various conditions for cache size, way and block. Various simulations can be found to influence the cache hit rate. We compare cache simulation results with actual hardware performance to analyze cache performance suitable for sound rendering.