Search | Korea Science

Fast GPU Implementation for the Solution of Tridiagonal Matrix Systems (삼중대각행렬 시스템 풀이의 빠른 GPU 구현)

Kim, Yong-Hee;Lee, Sung-Kee
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.11_12
- /
- pp.692-704
- /
- 2005
With the improvement of computer hardware, GPUs(Graphics Processor Units) have tremendous memory bandwidth and computation power. This leads GPUs to use in general purpose computation. Especially, GPU implementation of compute-intensive physics based simulations is actively studied. In the solution of differential equations which are base of physics simulations, tridiagonal matrix systems occur repeatedly by finite-difference approximation. From the point of view of physics based simulations, fast solution of tridiagonal matrix system is important research field. We propose a fast GPU implementation for the solution of tridiagonal matrix systems. In this paper, we implement the cyclic reduction(also known as odd-even reduction) algorithm which is a popular choice for vector processors. We obtained a considerable performance improvement for solving tridiagonal matrix systems over Thomas method and conjugate gradient method. Thomas method is well known as a method for solving tridiagonal matrix systems on CPU and conjugate gradient method has shown good results on GPU. We experimented our proposed method by applying it to heat conduction, advection-diffusion, and shallow water simulations. The results of these simulations have shown a remarkable performance of over 35 frame-per-second on the 1024x1024 grid.
PDF KSCI

Efficient GPU Isosurface Ray-casting of BCC Datasets (효율적인 BCC 볼륨 데이터의 GPU 등가면 광선투사법)

Kim, Minho;Kim, Hyunjun;Sarfaraz, Aaliya
- Journal of the Korea Computer Graphics Society
- /
- v.19 no.2
- /
- pp.19-27
- /
- 2013
This paper presents a real-time GPU (Graphics Processing Unit) isosurface ray-caster that improves the performance by 4-7 folds from our previous method, while keeping the superior visual quality. Such an improvement is achieved by incorporating an efficient empty-space skipping scheme and an analytic normal computation. The empty-space skipping scheme is done by building an min/max octree computed from the BB(Bernslein-B$\acute{e}$zier)-form of spline pieces and the analytic normal Formula provides not only a nice visual quality but also an improved evaluation performance.
PDF KSCI

An Improved CYK Algorithm based on GPGPU (GPGPU 기반의 개선된 CYK 알고리즘)

Kim, Kyoung-Hwan;Han, Yo-Sub
- Proceedings of the Korean Information Science Society Conference
- /
- 2012.06a
- /
- pp.409-410
- /
- 2012
범용 계산에 GPU를 활용하는 GPGPU 연구가 활발히 이루어지고 있다. 기존 연구에서 사용된 병렬화 기법은 데이터 이동시 GPU의 유휴자원을 잘 활용하지 못한다. 우리는 스트림 기법을 이용하여 CPU-GPU간 데이터 이동과 GPU내 연산을 동시에 실행시켜 데이터 이동시 GPU의 유휴자원을 최대한 활용하여 성능을 향상한다. 제안된 방식은 기존의 병렬화 방법에 비해 약 1.1배 향상된 성능을 나타낸다.

Stereo-To-Multiview Conversion System Using FPGA and GPU Device (FPGA와 GPU를 이용한 스테레오/다시점 변환 시스템)

Shin, Hong-Chang;Lee, Jinwhan;Lee, Gwangsoon;Hur, Namho
- Journal of Broadcast Engineering
- /
- v.19 no.5
- /
- pp.616-626
- /
- 2014
In this paper, we introduce a real-time stereo-to-multiview conversion system using FPGA and GPU. The system is based on two different devices so that it consists of two major blocks. The first block is a disparity estimation block that is implemented on FPGA. In this block, each disparity map of stereoscopic video is estimated by DP(dynamic programming)-based stereo matching. And then the estimated disparity maps are refined by post-processing. The refined disparity map is transferred to the GPU device through USB 3.0 and PCI-express interfaces. Stereoscopic video is also transferred to the GPU device. These data are used to render arbitrary number of virtual views in next block. In the second block, disparity-based view interpolation is performed to generate virtual multi-view video. As a final step, all generated views have to be re-arranged into a single image at full resolution for presenting on the target autostereoscopic 3D display. All these steps of the second block are performed in parallel on the GPU device.
https://doi.org/10.5909/JBE.2014.19.5.616 인용 PDF KSCI KPUBS

A CPU and GPU Heterogeneous Computing Techniques for Fast Representation of Thin Features in Liquid Simulations (액체 시뮬레이션의 얇은 특징을 빠르게 표현하기 위한 CPU와 GPU 이기종 컴퓨팅 기술)

Kim, Jong-Hyun
- Journal of the Korea Computer Graphics Society
- /
- v.24 no.2
- /
- pp.11-20
- /
- 2018
We propose a new method particle-based method that explicitly preserves thin liquid sheets for animating liquids on CPU-GPU heterogeneous computing framework. Our primary contribution is a particle-based framework that splits at thin points and collapses at dense points to prevent the breakup of liquid on GPU. In contrast to existing surface tracking methods, the our method does not suffer from numerical diffusion or tangles, and robustly handles topology changes on CPU-GPU framework. The thin features are detected by examining stretches of distributions of neighboring particles by performing PCA(Principle component analysis), which is used to reconstruct thin surfaces with anisotropic kernels. The efficiency of the candidate position extraction process to calculate the position of the fluid particle was rapidly improved based on the CPU-GPU heterogeneous computing techniques. Proposed algorithm is intuitively implemented, easy to parallelize and capable of producing quickly detailed thin liquid animations.
https://doi.org/10.15701/kcgs.2018.24.2.11 인용 PDF KSCI

Multi-Scale Contact Analysis Between Net and Numerous Particles (그물망과 대량입자의 멀티 스케일 접촉해석)

Jun, Chul Woong;Sohn, Jeong Hyun
- Transactions of the Korean Society of Mechanical Engineers A
- /
- v.38 no.1
- /
- pp.17-23
- /
- 2014
Graphics processing units (GPUs) are ideal for solving problems involving parallel data computations. In this study, the GPU is used for effectively carrying out a multi-body dynamic simulation with particle dynamics. The Hilber-Hushes-Taylor (HHT) implicit integration algorithm is used to solve the integral equations. For detecting collisions among particles, the spatial subdivision algorithm and discrete-element methods (DEM) are employed. The developed program is verified by comparing its results with those of ADAMS. The numerical efficiencies of the serial program using the CPU and the parallel program using the GPU are compared in terms of the number of particles, and it is observed that when the number of particles is greater, more computing time is saved by using the GPU. In the present example, when the number of particles is 1,300, the computational speed of the parallel analysis program is about 5 times faster than that of the serial analysis program.
https://doi.org/10.3795/KSME-A.2014.38.1.017 인용 PDF KSCI

An MPI-CUDA Implementation for Parallel Scalability on Multi-GPU Clusters (멀티-GPU 기반 MPI-CUDA 병렬 성능 확장성)

Yi, Hong-Suk;Lee, Seung-Min
- Proceedings of the Korean Information Science Society Conference
- /
- 2012.06a
- /
- pp.13-15
- /
- 2012
매우 빠른 GPU의 성능과 저가의 개발 비용으로, 최신 GPU는 대용량 계산과학 분야에 꼭 필수적인 자원으로 등장하였다. 이 논문에서는 멀티-GPU 클러스터 시스템에서 GPU 컴퓨팅 기술을 적용한 대용량 Monte Carlo 알고리즘을 개발하였다. MPI와 CUDA를 동시에 적용한 결과 8개 GPU까지 병렬 확장성을 얻을 수 있었다. 병렬 성능 확장성 분석 결과, 멀티-GPU 클러스터에서는 GPU 사이의 데이터 통신이 전체 프로그램 성능 향상을 결정하는 매우 중요한 요인임을 보였다.

Implementation of GPU based MPEG-2 Decoder (GPU 기반의 MPEG-2 디코더의 구현)

Kim, Kyung-Su;Kim, Hong-Sik;Kim, Cheong-Ghil;Park, Woo-Chan
- Journal of Digital Contents Society
- /
- v.9 no.3
- /
- pp.371-377
- /
- 2008
Recently the performance of GPU is increasing much faster compared to GPU and GPU is used for various application programs. In this paper, MPEG-2 Decoder is implemented based on a GPU programming language, CG. The proposed methodology is to perform block rendering with texture data according to video standard with very high parallelism by using the pipeline of GPU which is a stream processing structure. To reduce the data bandwidth between system memory and GPU, local memory is used for graphic card. According to the experiment, the proposed scheme shows performance improvement by more than 2 times compared to CPU based scheme.
PDF

A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems (방출단층촬영 시스템을 위한 GPU 기반 반복적 기댓값 최대화 재구성 알고리즘 연구)

Ha, Woo-Seok;Kim, Soo-Mee;Park, Min-Jae;Lee, Dong-Soo;Lee, Jae-Sung
- Nuclear Medicine and Molecular Imaging
- /
- v.43 no.5
- /
- pp.459-467
- /
- 2009
Purpose: The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Materials and Methods: Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. Results: The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 see, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 see, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. Conclusion: The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries.
PDF KSCI

GPU-based Rendering of Blending Surfaces (블렌딩 곡면의 GPU 기반 렌더링)

Ko, Dae-Hyun
- Journal of the Korea Computer Graphics Society
- /
- v.13 no.1
- /
- pp.1-6
- /
- 2007
Although free-form surfaces can represent smooth shapes with only a few control points contrary to polygonal meshes, graphics hardware does not support surface rendering currently. Since modern programmable graphics pipeline can be used to accelerate various kinds of existing graphics algorithms, this paper presents a method that utilizes the graphics processing unit (GPU) to render blending surfaces with arbitrary topology fast. Surface parameters sampled on the control mesh and geometric data for local surfaces are sent to the graphics pipeline, and then the vertex processor evaluates the surface positions and normals with these data. This method can achieve very high performance rather than CPU-based rendering.
PDF

Search Result 196, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)