• Title/Summary/Keyword: GPU 상의 범용 계산

Search Result 5, Processing Time 0.025 seconds

Kinematic Wave Rainfall-Runoff Model Using CUDA FORTRAN (CUDA FORTRAN을 이용한 운동파 강우유출모형)

  • Kim, Boram;Kim, Dae-Hong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.271-271
    • /
    • 2018
  • 그래픽 처리 장치(GPU: Graphic Processing Units)는 그래픽 처리에 특화된 수많은 산술논리연산자 (ALU: Arithmetic Logic Unit)와 이에 관련된 인스트럭션Instruction)으로 인해 중앙 처리 장치(CPU: Central Processing Units) 보다 훨씬 빠른 계산 처리를 수행할 수 있다. 최근에는 FORTRAN에 의해 구현된 많은 수치모형들이 현실적인 모델링 방법의 발달로 인해 더 많은 계산량과 계산시간을 필요로 한다. 이 연구에서는 GPU 상의 범용 계산GPGPU : General-Purpose computing on Graphics Processing Units) 기반 운동파 강우유출모형(Kinematic Wave Rainfall-Runoff Model)이 CUDA(Compute Unified Device Architecture) FORTRAN을 사용하여 구현되었다. CUDA FORTRAN 운동파 강우유출모형의 계산 결과는 검증된 CPU 기반 운동파 강우유출모형의 계산 결과와 비교하여 검증되었으며, 잘 일치함을 보여 주었다. CUDA FORTRAN 운동파 강우유출모형은 CPU 기반 모형에 비해 약 20 배 더 빠른 계산 시간을 보였다. 또한 계산 영역이 커짐에 따라 CPU 버전에 비해 CUDA FORTRAN 버전의 계산 효율이 향상되었다.

  • PDF

IPC-based Dynamic SM management on GPGPU for Executing AES Algorithm

  • Son, Dong Oh;Choi, Hong Jun;Kim, Cheol Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.2
    • /
    • pp.11-19
    • /
    • 2020
  • Modern GPU can execute general purpose computation on the graphic processing unit, and provide high performance by exploiting many core on GPU. To run AES algorithm efficiently, parallel computational resources are required. However, computational resource of CPU architecture are not enough to cryptographic algorithm such as AES whereas GPU architecture has mass parallel computation resources. Therefore, this paper reduce the time to execute AES by employing parallel computational resource on GPGPU. Unfortunately, AES cannot utilize computational resource on GPGPU since it isn't suitable to GPGPU architecture. In this paper, IPC based dynamic SM management technique are proposed to efficiently execute AES on GPGPU. IPC based dynamic SM management can increase and decrease the number of active SMs by using IPC in run-time. According to simulation results, proposed technique improve the performance by increasing resource utilization compared to baseline GPGPU architecture. The results show that AES improve the performance by 41.2% on average.

Fast GPU Implementation for the Solution of Tridiagonal Matrix Systems (삼중대각행렬 시스템 풀이의 빠른 GPU 구현)

  • Kim, Yong-Hee;Lee, Sung-Kee
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.692-704
    • /
    • 2005
  • With the improvement of computer hardware, GPUs(Graphics Processor Units) have tremendous memory bandwidth and computation power. This leads GPUs to use in general purpose computation. Especially, GPU implementation of compute-intensive physics based simulations is actively studied. In the solution of differential equations which are base of physics simulations, tridiagonal matrix systems occur repeatedly by finite-difference approximation. From the point of view of physics based simulations, fast solution of tridiagonal matrix system is important research field. We propose a fast GPU implementation for the solution of tridiagonal matrix systems. In this paper, we implement the cyclic reduction(also known as odd-even reduction) algorithm which is a popular choice for vector processors. We obtained a considerable performance improvement for solving tridiagonal matrix systems over Thomas method and conjugate gradient method. Thomas method is well known as a method for solving tridiagonal matrix systems on CPU and conjugate gradient method has shown good results on GPU. We experimented our proposed method by applying it to heat conduction, advection-diffusion, and shallow water simulations. The results of these simulations have shown a remarkable performance of over 35 frame-per-second on the 1024x1024 grid.

Development of Diffusive Wave Rainfall-Runoff Model Based on CUDA FORTRAN (CUDA FORTEAN기반 확산파 강우유출모형 개발)

  • Kim, Boram;Kim, Hyeong-Jun;Yoon, Kwang Seok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.287-287
    • /
    • 2021
  • 본 연구에서는 CUDA(Compute Unified Device Architecture) 포트란을 이용하여 확산파 강우 유출모형을 개발하였다. CUDA 포트란은 그래픽 처리 장치(Graphic Processing Unit: GPU)에서 수행하는 병렬 연산 알고리즘을 포트란 언어를 사용하여 작성할 수 있도록 하는 GPU상의 범용계산(General-Purpose Computing on Graphics Processing Units: GPGPU) 기술이다. GPU는 그래픽 처리 작업에 특화된 다수의 산술 논리 장치(Arithmetic Logic Unit: ALU)로 구성되어 있어서 중앙 처리 장치(Central Processing Unit: CPU)보다 한 번에 더 많은 연산 수행이 가능하다. 이에 따라, CUDA 포트란기반 확산파모형은 분포형 강우유출모형의 수치모의 연산시간을 단축시킬 수 있다. 분포형모형의 지배방정식은 확산파모형과 Green-Ampt모형으로 구성되었고, 확산파모형은 유한체적법을 이용하여 이산화 하였다. CUDA 포트란기반 확산파모형의 정확성은 기존 연구된 수리실험 결과 및 CPU기반 강우유출모형과 비교하였으며, 연산소요시간에 대한 효율성은 CPU기반 확산파모형과 비교하였다. 그 결과 CUDA 포트란기반 확산파모형의 결과는 수리실험 결과 및 CPU기반 강우유출모형의 결과와 유사한 결과를 나타냈다. 또한, 연산소요시간은 CPU 기반 확산파모형의 연산소요시간보다 단축되었으며, 본 연구에 사용된 장비를 기준으로 최대 100배 정도 단축되었다.

  • PDF

Fast Multi-GPU based 3D Backprojection Method (다중 GPU 기반의 고속 삼차원 역전사 기법)

  • Lee, Byeong-Hun;Lee, Ho;Kye, Hee-Won;Shin, Yeong-Gil
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.2
    • /
    • pp.209-218
    • /
    • 2009
  • 3D backprojection is a kind of reconstruction algorithm to generate volume data consisting of tomographic images, which provides spatial information of the original 3D data from hundreds of 2D projections. The computational time of backprojection increases in proportion to the size of volume data and the number of projection images since the value of every voxel in volume data is calculated by considering corresponding pixels from hundreds of projections. For the reduction of computational time, fast GPU based 3D backprojection methods have been studied recently and the performance of them has been improved significantly. This paper presents two multiple GPU based methods to maximize the parallelism of GPU and compares the efficiencies of two methods by considering both the number of projections and the size of volume data. The first method is to generate partial volume data independently for all projections after allocating a half size of volume data on each GPU. The second method is to acquire the entire volume data by merging the incomplete volume data of each GPU on CPU. The in-complete volume data is generated using the half size of projections after allocating the full size of volume data on each GPU. In experimental results, the first method performed better than the second method when the entire volume data can be allocated on GPU. Otherwise, the second method was efficient than the first one.

  • PDF