• Title/Summary/Keyword: Compute Shader

Search Result 12, Processing Time 0.022 seconds

Parallel Algorithm of Conjugate Gradient Solver using OpenGL Compute Shader

  • Va, Hongly;Lee, Do-keyong;Hong, Min
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.1-9
    • /
    • 2021
  • OpenGL compute shader is a shader stage that operate differently from other shader stage and it can be used for the calculating purpose of any data in parallel. This paper proposes a GPU-based parallel algorithm for computing sparse linear systems through conjugate gradient using an iterative method, which perform calculation on OpenGL compute shader. Basically, this sparse linear solver is used to solve large linear systems such as symmetric positive definite matrix. Four well-known matrix formats (Dense, COO, ELL and CSR) have been used for matrix storage. The performance comparison from our experimental tests using eight sparse matrices shows that GPU-based linear solving system much faster than CPU-based linear solving system with the best average computing time 0.64ms in GPU-based and 15.37ms in CPU-based.

Implementation of Particle System Using GLSL 4.3 (GLSL 4.3을 사용한 파티클 시스템 구현)

  • Choi, Yooung-Hwan;Hong, Min;Choi, Yoo-Joo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.04a
    • /
    • pp.189-191
    • /
    • 2016
  • 실시간 물리 기반 3D 시뮬레이션에서 연산속도는 매우 중요한 요소이다. 객체의 움직임이나 변형과 같은 현상들은 복잡한 연산을 통해서 계산되기 때문에 일반적으로 시뮬레이션의 정확도와 연산속도는 반비례 관계에 있다. 현재 출시되고 있는 대부분의 게임에서는 물체의 움직임을 정확하게 표현하기보다 연산량을 줄이기 위해 물체의 움직임이나 변형을 비슷하게 표현하는데 중점을 두고 있다. 본 논문에서는 이러한 문제를 해결하기 위하여 OpenGL 4.3의 Compute shader를 사용하여 다이내믹 시뮬레이션의 연산 작업을 GPU 병렬처리로 처리하였다. Compute shader에서 파티클의 움직임을 계산하고 Shader storage buffer object에 저장하고 파티클들의 작업량을 적절한 Workgroup의 크기로 나누어 할당하여 최적의 처리속도를 제공하도록 구현하였다. Compute shader에서 파티클의 움직임을 표현하기 위해서 수치해법 중의 하나인 Euler method를 사용하였으며 실험 결과 파티클의 수가 4,194,304개일 때 CPU 방법에 비해 약 182배 빠른 연산속도 결과를 보였다. 추후 Compute shader를 활용하여 연산량이 많은 분야에 적용 가능할 수 있을 것으로 기대한다.

Performance Comparison of Particle Simulation Using GPU Between OpenGL and Unity (OpenGL과 Unity간의 GPU를 이용한 Particle Simulation의 성능 비교)

  • Kim, Min Sang;Sung, Nak-Jun;Choi, Yoo-Joo;Hong, Min
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.479-486
    • /
    • 2017
  • Recently, GPGPU has been able to increase the degradation of computer performance, and it is now possible to run physically based real-time simulations on PCs that require high computational complexity. Physical calculations applied in physics simulation can be performed by parallel processing, and can be efficiently performed using parallel computation using Compute shader recently supported by OpenGL 4.3 and Unity 4.0. In this paper, we measure and compare the number of performance in real - time physics simulation in OpenGL running on various platforms and Unity, a content creation tool supporting various platforms. Particle simulation experiments show that particle simulation using Unity performs faster than 136.04%. It is expected that it will be able to select better development tools for future multi - platform support.

Simulation of Deformable Objects using GLSL 4.3

  • Sung, Nak-Jun;Hong, Min;Lee, Seung-Hyun;Choi, Yoo-Joo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.8
    • /
    • pp.4120-4132
    • /
    • 2017
  • In this research, we implement a deformable object simulation system using OpenGL's shader language, GLSL4.3. Deformable object simulation is implemented by using volumetric mass-spring system suitable for real-time simulation among the methods of deformable object simulation. The compute shader in GLSL 4.3 which helps to access the GPU resources, is used to parallelize the operations of existing deformable object simulation systems. The proposed system is implemented using a compute shader for parallel processing and it includes a bounding box-based collision detection solution. In general, the collision detection is one of severe computing bottlenecks in simulation of multiple deformable objects. In order to validate an efficiency of the system, we performed the experiments using the 3D volumetric objects. We compared the performance of multiple deformable object simulations between CPU and GPU to analyze the effectiveness of parallel processing using GLSL. Moreover, we measured the computation time of bounding box-based collision detection to show that collision detection can be processed in real-time. The experiments using 3D volumetric models with 10K faces showed the GPU-based parallel simulation improves performance by 98% over the CPU-based simulation, and the overall steps including collision detection and rendering could be processed in real-time frame rate of 218.11 FPS.

Parallel Algorithm for Matrix-Matrix Multiplication on the GPU (GPU 기반 행렬 곱셈 병렬처리 알고리즘)

  • Park, Sangkun
    • Journal of Institute of Convergence Technology
    • /
    • v.9 no.1
    • /
    • pp.1-6
    • /
    • 2019
  • Matrix multiplication is a fundamental mathematical operation that has numerous applications across most scientific fields. In this paper, we presents a parallel GPU computation algorithm for dense matrix-matrix multiplication using OpenGL compute shader, which can play a very important role as a fundamental building block for many high-performance computing applications. Experimental results on NVIDIA Quad 4000 show that the proposed algorithm runs about 208 times faster than previous CPU algorithm and achieves performance of 75 GFLOPS in single precision for dense matrices with matrix size 4,096. Such performance proves that our algorithm is practical for real applications.

Matrix Addition & Scalar Multiplication on the GPU (GPU 기반 행렬 덧셈 및 스칼라 곱셈 알고리즘)

  • Park, Sangkun
    • Journal of Institute of Convergence Technology
    • /
    • v.8 no.1
    • /
    • pp.15-20
    • /
    • 2018
  • Recently a GPU has acquired programmability to perform general purpose computation fast by running thousands of threads concurrently. This paper presents a parallel GPU computation algorithm for dense matrix-matrix addition and scalar multiplication using OpenGL compute shader. It can play a very important role as a fundamental building block for many high-performance computing applications. Experimental results on NVIDIA Quad 4000 show that the proposed algorithm runs 21 times faster than CPU algorithm and achieves performance of 16 GFLOPS in single precision for dense matrices with size 4,096. Such performance proves that our algorithm is practical for real applications.

The Performance Analysis of GPU-based Cloth simulation according to the Change of Work Group Configuration (워크 그룹 구성 변화에 따른 GPU 기반 천 시뮬레이션의 성능 분석)

  • Choi, Young-Hwan;Hong, Min;Lee, Seung-Hyun;Choi, Yoo-Joo
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.29-36
    • /
    • 2017
  • In these days, 3D dynamic simulation is closely related to many industries. In the past, physically-based 3D simulation was used mainly in the car crash or construction related fields, but it also plays an important role in movies or games today. Many mathematical computations are needed to represent the 3D object realistically, but it is difficult to process a large amount of calculations for simulation of application based on CPU in real-time. Recently, with the advanced graphic hardware and improved architecture, GPU can be utilized for the general purposes of computation function as well as graphic computation. Many approaches using GPU have been applied for various research fields. In this paper, we analyze the performance variation of two cloth simulation algorithms based on GPU according to the change of execution properties of GPU shaders in oder to optimize the performance of GPU-based cloth simulation. Cloth simulation is implemented by the spring centric algorithm and node centric algorithm with GPU parallel computing using compute shader of GLSL 4.3. We compare the performance of between these algorithms according to the change of the size and dimension of work group. The experiment is repeated to 10 times during 5,000 frames for each test and experimental results are provided by averaging of FPS. The experimental result shows that the node centric algorithm is executed in higher speed than the spring centric algorithm.

GPU-Accelerated Single Image Depth Estimation with Color-Filtered Aperture

  • Hsu, Yueh-Teng;Chen, Chun-Chieh;Tseng, Shu-Ming
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.3
    • /
    • pp.1058-1070
    • /
    • 2014
  • There are two major ways to implement depth estimation, multiple image depth estimation and single image depth estimation, respectively. The former has a high hardware cost because it uses multiple cameras but it has a simple software algorithm. Conversely, the latter has a low hardware cost but the software algorithm is complex. One of the recent trends in this field is to make a system compact, or even portable, and to simplify the optical elements to be attached to the conventional camera. In this paper, we present an implementation of depth estimation with a single image using a graphics processing unit (GPU) in a desktop PC, and achieve real-time application via our evolutional algorithm and parallel processing technique, employing a compute shader. The methods greatly accelerate the compute-intensive implementation of depth estimation with a single view image from 0.003 frames per second (fps) (implemented in MATLAB) to 53 fps, which is almost twice the real-time standard of 30 fps. In the previous literature, to the best of our knowledge, no paper discusses the optimization of depth estimation using a single image, and the frame rate of our final result is better than that of previous studies using multiple images, whose frame rate is about 20fps.

Surface Model and Scattering Analysis for Realistic Game Character

  • Kim, Seongdong;Lee, Myounjae
    • Journal of Korea Game Society
    • /
    • v.21 no.4
    • /
    • pp.109-116
    • /
    • 2021
  • In this paper, we considered that recently 3D game characters have been almost alike realistic expression because of a great mathematical computation and efficient techniques on GPU hardware. We presented the rendering technique and analysis for 3D game characters to simulate and render mathematical approach model from recent researches to perform the game engine for the surface reflection of lighting model. We compare our approach with the existing variant rendering techniques here using Open GL shader language on game engine. The experimental result will be provided the view-dependent visual appearance of variant and effective modeling characters for realistic expression using existing methods on the GPU for effective simulations and rendering process. Since there are many operations that are used redundantly while performing mathematical operations, the necessary functions and requirements have been to compute in advance.

GPU-based dynamic point light particles rendering using 3D textures for real-time rendering (실시간 렌더링 환경에서의 3D 텍스처를 활용한 GPU 기반 동적 포인트 라이트 파티클 구현)

  • Kim, Byeong Jin;Lee, Taek Hee
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.3
    • /
    • pp.123-131
    • /
    • 2020
  • This study proposes a real-time rendering algorithm for lighting when each of more than 100,000 moving particles exists as a light source. Two 3D textures are used to dynamically determine the range of influence of each light, and the first 3D texture has light color and the second 3D texture has light direction information. Each frame goes through two steps. The first step is to update the particle information required for 3D texture initialization and rendering based on the Compute shader. Convert the particle position to the sampling coordinates of the 3D texture, and based on this coordinate, update the colour sum of the particle lights affecting the corresponding voxels for the first 3D texture and the sum of the directional vectors from the corresponding voxels to the particle lights for the second 3D texture. The second stage operates on a general rendering pipeline. Based on the polygon world position to be rendered first, the exact sampling coordinates of the 3D texture updated in the first step are calculated. Since the sample coordinates correspond 1:1 to the size of the 3D texture and the size of the game world, use the world coordinates of the pixel as the sampling coordinates. Lighting process is carried out based on the color of the sampled pixel and the direction vector of the light. The 3D texture corresponds 1:1 to the actual game world and assumes a minimum unit of 1m, but in areas smaller than 1m, problems such as stairs caused by resolution restrictions occur. Interpolation and super sampling are performed during texture sampling to improve these problems. Measurements of the time taken to render a frame showed that 146 ms was spent on the forward lighting pipeline, 46 ms on the defered lighting pipeline when the number of particles was 262144, and 214 ms on the forward lighting pipeline and 104 ms on the deferred lighting pipeline when the number of particle lights was 1,024766.