• Title/Summary/Keyword: Graphics processing unit

Search Result 190, Processing Time 0.029 seconds

Fast Computation of DWT and JPEG2000 using GPU (GPU를 이용한 DWT 및 JPEG2000의 고속 연산)

  • Lee, Man-Hee;Park, In-Kyu;Won, Seok-Jin;Cho, Sung-Dae
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.6
    • /
    • pp.9-15
    • /
    • 2007
  • In this paper, we propose an efficient method for Processing DWT (Discrete Wavelet Transform) on GPU (Graphics Processing Unit). Since the DWT and EBCOT (embedded block coding with optimized truncation) are the most complicated submodules in JPEG2000, we design a high-performance processing framework for performing DWT using the fragment shader of GPU based on the render-to-texture (RTT) architecture. Experimental results show that the performance increases significantly, in which DWT running on modern GPU is more than 10 times faster than on modern CPU. Furthermore, by replacing the DWT part of Jasper which is the JPEG2000 reference software, the overall processing is 2$\sim$16 times faster than the original JasPer. The GPU-driven render-to-texture architecture proposed in this paper can be used in the general image and computer vision processing for high-speed processing.

The Design of VGE(Vector Geometry Engine) for 3D Graphics Geometry Processing (3차원 그래픽 지오메트리 연산을 위한 벡터 지오메트리 엔진의 설계.)

  • 김원석;정철호;한탁돈
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.1_2
    • /
    • pp.135-143
    • /
    • 2004
  • 3D Graphics accelerator is usually composed of two parts, geometry engine and rasterizer. In this paper, VGE(Vector Geometry Engine) which exploits vertex-level parallelism is proposed. In VGE, Common Floating-Point Unit by adding four-FADD, four-FMUL unit and 128-vector register accelerates geometry calculation. In comparison with SH4, Performance result show that the VGE can achieve performance gain over 4.7 times. To evaluate VGE performance, we make simulator to rebuild Simple-Scalar, general purpose processor simulator. In simulator model, we use Viewperf-benchmark.

Multiple GPU Scheduling for Improved Acquisition of Real-Time 360 VR Game Video (실시간 360 VR 스테레오 게임 영상 획득 성능 개선을 위한 다중 GPU 스케줄링에 관한 연구)

  • Lee, Junsuk;Paik, Joonki
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.974-982
    • /
    • 2019
  • Real-time 360 VR (Virtual Reality) stereo image acquisition technique based on game engine was proposed. However, GPU (Graphics Processing Unit) resource is not fully utilized due to bottlenecks. In this paper, we propose an improved GPU scheduling technique to solve the bottleneck of the existing technique and measure the performance of the proposed technique using the sample games of the commercial game engine. As a result, proposed technique showed an improvement of performance up to 70% and usage of GPU resources more evenly compared existing technique.

Efficient Computation of Isosurface Curvatures on GPUs Based on the de Boor Algorithm (드 부어 알고리즘을 이용한 GPU에서의 효율적인 등가면 곡률 계산)

  • Kim, Minho
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.3
    • /
    • pp.47-54
    • /
    • 2017
  • In this paper, we propose an improved curvature-based GPU (Graphics Processing Unit) isosurface ray-casting technique. Our method adopts the fast evaluation method proposed by Sigg et al. [1] to find the isosurface, but replaces the computation of the gradient and Hessian with the de Boor algorithm. In this way, we can reduce the number of additional texture fetches from 84 to 27 thus improving the performance by up to ${\approx}30%$, depending on the platforms.

A Tool for On-the-fly Repairing of Atomicity Violation in GPU Program Execution

  • Lee, Keonpyo;Lee, Seongjin;Jun, Yong-Kee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.9
    • /
    • pp.1-12
    • /
    • 2021
  • In this paper, we propose a tool called ARCAV (Atomatic Recovery of CUDA Atomicity violation) to automatically repair atomicity violations in GPU (Graphics Processing Unit) program. ARCAV monitors information of every barrier and memory to make actual memory writes occur at the end of the barrier region or to make the program execute barrier region again. Existing methods do not repair atomicity violations but only detect the atomicity violations in GPU programs because GPU programs generally do not support lock and sleep instructions which are necessary for repairing the atomicity violations. Proposed ARCAV is designed for GPU execution model. ARCAV detects and repairs four patterns of atomicity violations which represent real-world cases. Moreover, ARCAV is independent of memory hierarchy and thread configuration. Our experiments show that the performance of ARCAV is stable regardless of the number of threads or blocks. The overhead of ARCAV is evaluated using four real-world kernels, and its slowdown is 2.1x, in average, of native execution time.

Design of a Dispatch Unit & Operand Selection Unit for Improving the SIMT Based GP-GPU Instruction Performance (SIMT구조 GP-GPU의 명령어 처리 성능 향상을 위한 Dispatch Unit과 Operand Selection Unit설계)

  • Kwak, Jae Chang
    • Journal of IKEEE
    • /
    • v.19 no.3
    • /
    • pp.455-459
    • /
    • 2015
  • This paper proposes a dispatch unit of GP-GPU with SIMT architecture to support the acceleration of general-purpose operation as well as graphics processing. If all the information of an operand used instructions issued from the warp scheduler is decoded, an unnecessary operand load occurs, resulting in register loads. To resolve this problem, this paper proposes a method that can reduce the operand load and the load on the resister by decoding only the information of the operand using a pre-decoding method. The operand information from the dispatch unit is passed to the operand selection unit with preventing register bank collisions. Thus the overall performance are improved. In the simulation test, the total clock cycles required by processing 10,000 arbitrary instructions issued from the wrap scheduler using ModelSim SE 10.0b are measured. It shows that the application of the dispatch unit equipped with the pre-decoding function proposed in this paper can make an improvement of about 12% in processing performance compared to the conventional method.

A 3D graphic pipelines with an efficient clipping algorithm (효율적인 클리핑 기능을 갖는 3차원 그래픽 파이프라인 구조)

  • Lee, Chan-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.8
    • /
    • pp.61-66
    • /
    • 2008
  • Recently, portable devices which require small area and low power consumption employ applications using 3D graphics such as 3D games and 3D graphical user interfaces. We propose an efficient clipping engine algorithm which is suitable in 3D graphics pipeline. The clipping operation is divided into two steps: one is the selection process in the transformation engine and the other is the pixel clipping process in the scan conversion unit. The clipping operation is possible with addition of simple comparator. The clipping for the Y-axis is achieved in the edge walk stage and that for the X and Z-axis is performed in the span processing. The proposed clipping algorithm reduces the operation cycles and the area of of 3D graphics pipelines. We designed a 3D graphics pipeline with the proposed clipping algorithm using Verilog-HDL and verifies the operation using an FPGA.

A Design of Floating-Point Geometry Processor for Embedded 3D Graphics Acceleration (내장형 3D 그래픽 가속을 위한 부동소수점 Geometry 프로세서 설계)

  • Nam Ki hun;Ha Jin Seok;Kwak Jae Chang;Lee Kwang Youb
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.2 s.344
    • /
    • pp.24-33
    • /
    • 2006
  • The effective geometry processing IP architecture for mobile SoC that has real time 3D graphics acceleration performance in mobile information system is proposed. Base on the proposed IP architecture, we design the floating point arithmetic unit needed in geometry process and the floating point geometry processor supporting the 3D graphic international standard OpenGL-ES. The geometry processor is implemented by 160k gate area in a Xilinx-Vertex FPGA and we measure the performance of geometry processor using the actual 3D graphic data at 80MHz frequency environment The experiment result shows 1.5M polygons/sec processing performance. The power consumption is measured to 83.6mW at Hynix 0.25um CMOS@50MHz.

Implementation of Real-time Interactive Ray Tracing on GPU (GPU 기반의 실시간 인터렉티브 광선추적법 구현)

  • Bae, Sung-Min;Hong, Hyun-Ki
    • Journal of Korea Game Society
    • /
    • v.7 no.3
    • /
    • pp.59-66
    • /
    • 2007
  • Ray tracing is one of the classical global illumination methods to generate a photo-realistic rendering image with various lighting effects such as reflection and refraction. However, there are some restrictions on real-time applications because of its computation load. In order to overcome these limitations, many researches of the ray tracing based on GPU (Graphics Processing Unit) have been presented up to now. In this paper, we implement the ray tracing algorithm by J. Purcell and combine it with two methods in order to improve the rendering performance for interactive applications. First, intersection points of the primary ray are determined efficiently using rasterization on graphics hardware. We then construct the acceleration structure of 3D objects to improve the rendering performance. There are few researches on a detail analysis of improved performance by these considerations in ray tracing rendering. We compare the rendering system with environment mapping based on GPU and implement the wireless remote rendering system. This system is useful for interactive applications such as the realtime composition, augmented reality and virtual reality.

  • PDF

Implementation of Ray Tracing Processor for the Parallel Processing (병렬처리를 위한 고속 Ray Tracing 프로세서의 설계)

  • Choe, Gyu-Yeol;Jeong, Deok-Jin
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.5
    • /
    • pp.636-642
    • /
    • 1999
  • The synthesis of the 3D images is the most important part of the virtual reality. The ray tracing is the best method for reality in the 3D graphics. But the ray tracing requires long computation time for the synthesis of the 3D images. So, we implement the ray tracing with software and hardware. Specially we design the hit-test unit with FPGA tool for the ray tracing. Hit-test unit is a very important part of ray tracing to improve the speed. In this paper, we proposed a new hit-test algorithm and apply the parallel architecture for hit-test unit to improve the speed. We optimized the arithmetic unit because the critical path of hit-test unit is in the multiplication part. We used the booth algorithm and the baugh-wooley algorithm to reduce the partial product and adapted the CSA and CLA to improve the efficiency of the partial product addition. Our new Ray tracing processor can produce the image about 512ms/F and can be adapted to real-time application with only 10 parallel processors.

  • PDF