Search | Korea Science

Efficient Parallel CUDA Random Number Generator on NVIDIA GPUs (NVIDIA GPU 상에서의 난수 생성을 위한 CUDA 병렬프로그램)

Kim, Youngtae;Hwang, Gyuhyeon
- Journal of KIISE
- /
- v.42 no.12
- /
- pp.1467-1473
- /
- 2015
In this paper, we implemented a parallel random number generation program on GPU's, which are known for high performance computing, using LCG (Linear Congruential Generator). Random numbers are important in all fields requiring the use of randomness, and LCG is one of the most widely used methods for the generation of pseudo-random numbers. We explained the parallel program using the NVIDIA CUDA model and MPI(Message Passing Interface) and showed uniform distribution and performance results. We also used a Monte Carlo algorithm to calculate pi(${\pi}$) comparing the parallel random number generator with cuRAND, which is a CUDA library function, and showed that our program is much more efficient. Finally we compared performance results using multi-GPU's with those of ideal speedups.
https://doi.org/10.5626/JOK.2015.42.12.1467 인용 KSCI

A Design of a High Performance Stream Processor without Superscalar Architecture (슈퍼스칼라 구조를 갖지 않는 고성능 Stream Processor 설계)

Lee, Kwan-Ho;Kim, Chi-Yong
- Journal of IKEEE
- /
- v.21 no.1
- /
- pp.77-80
- /
- 2017
In this paper, we proposed a way to improve performance of GP-GPU by deletion of superscalar issue from its original form. At first, we simplified the structure of stream processor in order to eliminate superscalar issue. Under this condition, preservation of hardware size and increasing of thread number were followed by functional improvement of GP-GPU. As the number of thread was getting larger, we proposed the new model of warp scheduler which adjusts the group of thread. This superscalar issue-deleted warp scheduler transferred the instructions to warp which was activated by Round Robin Scheduling. Performance comparison was conducted by Gaussian filtering and the results indicated that our newly designed GP-GPU showing 7.89 times better in its performance than original one.
https://doi.org/10.7471/ikeee.2017.21.1.77 인용 PDF KSCI

Precise Sweep Volume Computation Accelerated by GPU (GPU 가속을 이용한 정밀밀한 스웹 볼륨 경계 계산)

Lee, Hyunho;Kyung, Minho
- Journal of the Korea Computer Graphics Society
- /
- v.21 no.1
- /
- pp.13-21
- /
- 2015
We present a robust GPU algorithm constructing a sweep volume boundary for a triangular mesh model. Sweeping geometric entities of a triangular mesh object is first approximated to a set of triangles, the envelope of which becomes the outer boundary of the sweep volume. We find the envelope by computing the arrangement of the triangle set and extracting its outmost boundary. To ensure robustness of the algorithm, we adopt random perturbation of sweep vertices and the interval arithmetic using multi-level precisions. The algorithm is implemented to perform most computation on GPU, and as a result it runs two orders of magnitude faster than other algorithms.
https://doi.org/10.15701/kcgs.2015.21.1.13 인용 PDF KSCI

A GPU scheduling framework for applications based on dataflow specification (데이터 플로우 기반 응용들을 위한 GPU 스케줄링 프레임워크)

Lee, Yongbin;Kim, Sungchan
- Journal of Korea Multimedia Society
- /
- v.17 no.10
- /
- pp.1189-1197
- /
- 2014
Recently, general purpose graphic processing units(GPUs) are being widely used in mobile embedded systems such as smart phone and tablet PCs. Because of architectural limitations of mobile GPGPUs, only a single program is allowed to occupy a GPU at a time in a non-preemptive way. As a result, it is difficult to meet performance requirements of applications such as frame rate or response time if applications running on a GPU are not scheduled properly. To tackle this difficulty, we propose to specify applications using synchronous data flow model of computation such that applications are formed with edges and nodes. Then nodes of applications are scheduled onto a GPU unlike conventional scheduling an application as a whole. This approach allows applications to share a GPU at a finer granularity, node (or task)-level, providing several benefits such as eliminating need for manually partitioning applications and better GPU utilization. Furthermore, any scheduling policy can be applied in response to the characteristics of applications.
https://doi.org/10.9717/kmms.2014.17.10.1189 인용 PDF KSCI KPUBS HTML

Design of Virtual Machine for Vertex Shader (정점 셰이더의 가상 기계 구현)

Ha, Chang-Soo;Kim, Ju-Hong;Choi, Byeong-Yoon
- Proceedings of the IEEK Conference
- /
- 2005.11a
- /
- pp.1003-1006
- /
- 2005
Vertex shader of GPU in personal computer is advanced in functions as to be half of traditional fixed T&L functions. And, capacity of memory for saving resources to process instructions is unlimited. GPU that can be programmed by programmer is needed for mobile system as well as personal computer. In this paper, we implement software virtual machine for vertex shader using C++ Language. Our goal is designing hardware GPU that can apply to mobile system. The virtual machine consists of nVidia GPU instructions. Input Data to virtual machine is generated by Microsoft fxc compiler. That is to say, Input Data is compiled shader program written in HLSL, Cg, or ASM. The virtual machine will be a reference model for designing hardware GPU and can be used for Testbed to test added or modified instruction.
PDF

GPU based Shrapnel Drop Computational Model for Specific Area (GPU 기반의 특정 영역에 대한 파편 낙하 계산 모델)

Kim, Tae-Gwon;Cho, Kyu-Tae;Lee, Seung-Young
- Proceedings of the Korea Information Processing Society Conference
- /
- 2016.10a
- /
- pp.41-42
- /
- 2016
특정 영역에 낙하하는 파편에 대한 계산은 파편의 개수가 증가할수록 계산량이 급격히 늘어나기 때문에 많은 자원이 소비된다. 이러한 파편의 낙하 대한 계산은 각각의 파편이 서로 영향을 받지 않기 때문에 일반적으로 CPU나 GPU를 활용하여 병렬로 연산을 수행할 수 있다. 이 논문에서는 특정 영역에 낙하하는 파편을 효율적으로 계산하기 위한 GPU 기반의 파편 낙하 계산 설계 모델을 제안한다. 이 모델은 공중의 특정점에서 폭파한 물체의 파편 방향을 계산한 후, 해당 방향으로 이동한 각각의 파편들이 떨어지는 방향에 대해 트리형식으로 계산을 반복적으로 수행해 최종 낙하 위치를 도출한다. 제안하는 방법은 GPU를 활용하여 파편의 낙하 영역을 사진트리를 통해 하향식(top-down)으로 계산하므로 넓은 영역에 대해 효율적으로 낙하점을 계산할 수 있다.
https://doi.org/10.3745/PKIPS.y2016m10a.41 인용 PDF

Development of nearshore sediment transport numerical model based on GPU engine (GPU 엔진 기반 연안의 실시간 유사이송 수치모형 개발)

Noh, Junsu;Son, Sangyoung
- Proceedings of the Korea Water Resources Association Conference
- /
- 2022.05a
- /
- pp.177-177
- /
- 2022
기후변화 및 해안 구조물의 증가 등 여러 원인이 연안침식 및 해안선 변화와 같은 연안의 지형변화를 가속하고 있다. 빠르게 변화하는 연안의 지형변화예측 및 대응책 강구를 위해서는 연안의 유사이송 현상에 대한 신속한 예측이 필요하다. 본 연구에서는 GPU 엔진 기반 파랑해석모형인 Celeris Advent를 활용하여 실시간으로 연안의 유사이송 모의가 가능한 수치모형을 개발하였다. Celeris Advent는 GPU의 병렬코어를 활용해 실시간 연산과 GUI를 통한 사용자와의 실시간 상호작용이 가능한 모형이다. 지배방정식은 확장형 Boussinesq 방정식에 유사이송방정식을 양방향 결합하여 구성하였고, 지배방정식에는 하이브리드 유한체적-유한차분 수치기법을 적용하여 이송항은 유한체적법(Kurganov & Petrova, 2007), 소스항은 유한차분법을 통해 이산화하여 해석한다. 유사이송방정식은 수심적분형 이송확산방정식에 침식 및 퇴적 플럭스를 반영하는 소스항을 결합하여, 이송항 및 확산항을 통해 유사의 이송/확산을 고려함과 동시에 소스항을 통해 하상과의 상호작용을 고려하였다.
PDF

Efficient Representation of Pore Flow, Absorption, Emission and Diffusion using GPU-Accelerated Cloth-Liquid Interaction

Jong-Hyun Kim
- Journal of the Korea Society of Computer and Information
- /
- v.29 no.6
- /
- pp.23-29
- /
- 2024
In this paper, we propose a fast GPU-based method for representing pore flow, absorption, emission, and diffusion effects represented by cloth-liquid interactions using smoothed particle hydrodynamics (SPH), a particle-based fluid solver: 1) a unified framework for GPU-based representation of various physical effects represented by cloth-liquid interactions; 2) a method for efficiently calculating the saturation of a node based on SPH and transferring it to the surrounding porous particles; 3) a method for improving the stability based on Darcy's law to reliably calculate the direction of fluid absorption and release; 4) a method for controlling the amount of fluid absorbed by the porous particles according to the direction of flow; and finally, 5) a method for releasing the SPH particles without exceeding their maximum mass. The main advantage of the proposed method is that all computations are computed and run on the GPU, allowing us to quickly model porous materials, porous flows, absorption, reflection, diffusion, etc. represented by the interaction of cloth and fluid.
https://doi.org/10.9708/jksci.2024.29.06.023 인용 PDF HTML

Improving the Performance of Document Similarity by using GPU Parallelism (GPU 병렬성을 이용한 문서 유사도 계산 성능 개선)

Park, Il-Nam;Bae, Byung-Gurl;Im, Eun-Jin;Kang, Seung-Shik
- The KIPS Transactions:PartB
- /
- v.19B no.4
- /
- pp.243-248
- /
- 2012
In the information retrieval systems like vector model implementation and document clustering, document similarity calculation takes a great part on the overall performance of the system. In this paper, GPU parallelism has been explored to enhance the processing speed of document similarity calculation in a CUDA framework. The proposed method increased the similarity calculation speed almost 15 times better compared to the typical CPU-based framework. It is 5.2 and 3.4 times better than the methods by using CUBLAS and Thrust, respectively.
https://doi.org/10.3745/KIPSTB.2012.19B.4.243 인용 PDF KSCI

Computing Performance Comparison of CPU and GPU Parallelization for Virtual Heart Simulation (가상 심장 시뮬레이션에서 CPU와 GPU 병렬처리의 계산 성능 비교)

Kim, Sang Hee;Jeong, Da Un;Setianto, Febrian;Lim, Ki Moo
- Journal of Biomedical Engineering Research
- /
- v.41 no.3
- /
- pp.128-137
- /
- 2020
Cardiac electrophysiology studies often use simulation to predict how cardiac will behave under various conditions. To observe the cardiac tissue movement, it needs to use the high--resolution heart mesh with a sophisticated and large number of nodes. The higher resolution mesh is, the more computation time is needed. To improve computation speed and performance, parallel processing using multi-core processes and network computing resources is performed. In this study, we compared the computational speeds of CPU parallelization and GPU parallelization in virtual heart simulation for efficiently calculating a series of ordinary differential equations (ODE) and partial differential equations (PDE) and determined the optimal CPU and GPU parallelization architecture. We used 2D tissue model and 3D ventricular model to compared the computation performance. Then, we measured the time required to the calculation of ODEs and PDEs, respectively. In conclusion, for the most efficient computation, using GPU parallelization rather than CPU parallelization can improve performance by 4.3 times and 2.3 times in calculations of ODEs and PDE, respectively. In CPU parallelization, it is best to use the number of processors just before the communication cost between each processor is incurred.
https://doi.org/10.9718/JBER.2020.41.3.128 인용 PDF KSCI

Search Result 164, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)