• Title/Summary/Keyword: GPU model

Search Result 164, Processing Time 0.031 seconds

A 2D GPU-Accelerated High Resolution Numerical Scheme for Solving Diffusive Wave Equation (고해상도 수치기법을 이용한 GPU 기반 2D 확산파 모형)

  • Park, Seonryang;Kim, Dae-Hong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.109-109
    • /
    • 2019
  • 본 연구에서는 강우-유출 과정 모의를 위한 GPU 기반 확산파 모형을 개발하였다. 확산파 방정식을 풀기위한 수치기법으로는 유한체적법을 이용하였으며, van Leer TVD limiter를 적용한 MUSCL 기법을 이용하여 각 셀의 인터페이스의 물리적 성질을 재구성하여 구하였다. 또한, 침투를 고려하기 위하여 Horton 침투 모형을 이용하였다. 개발된 모형을 이용하여 1D single overland plane과 2D V-shaped overland에서 강우-유출 과정을 모의실험을 하였으며, 각각 해석해와 dynamic wave model을 이용하여 계산된 수치 결과와 비교하여 본 모형의 정확성을 검증하였다. 또한, 1D와 2D의 기복이 심한 지형에 적용하여 강우-유출과정이 본 모형을 통하여 물리적으로 타당한 해석이 가능함을 검증하였다. 마지막으로 복잡한 실제 지형에 적용하였으며, 측정값과의 비교를 통하여 실제 유역에서의 확산파 모형의 적정성을 검증하였다. 또한, 본 연구에서는 NVIDIA사의 GPU인 Geforce GTX 1050과 GPU의 병렬 연산 처리 능력을 활용할 수 있는 NVIDIA사의 CUDA-Fortran을 이용하여 GPU 기반 확산파 모형을 개발하였다. PC windows에서 CPU(Intel i7, 4.70 GHz) 기반 모형 대비 GPU 기반 모형의 계산속도 성능을 비교한 결과, 격자 간격이 증가할수록 CPU 기반 모형 대비 GPU 기반 모형의 연산 효율이 증가하였으며, 격자 간격이 $3200{\times}3200$일 때, CPU 기반 모형 대비 GPU 기반 모형의 연산 효율이 최대 약 150배 증가하였다.

  • PDF

Application Analysis of GPU-Accelerated Kinematic Wave Model Using CUDA Fortran (CUDA FORTEAN을 이용한 GPU 가속 운동파모형 적용성 분석)

  • Kim, Boram;Kim, Hyung-Jun;Kim, Sooyoung;Yoon, Kwang Seok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.346-346
    • /
    • 2022
  • 본 연구에서는 GPU(Graphic Processing Unit) 가속 분포형모형을 실제 유역에 적용하여 강우 유출모의 결과의 정확성과 모의시간의 효율성에 대한 분석을 수행하였다. 분포형모형의 지배방정식은 운동파모형과 Green-Ampt모형으로 구성되어 있으며, 운동파모형은 유한체적법을 이용하여 이산화 하였다. GPU 가속 모형은 CUDA(Compute Unified Device Architecture) 포트란(Fortran)을 사용하여 개발된 모형으로 수치모의시 연산시간 단축을 고려한 모형이다. 모형의 정확성과 효율성은 미호천 유역에서 발생하는 강우유출현상에 GPU 가속 운동파모형을 적용하여 분석하였다. 수치모의 결과값은 대상유역에 속한 수위관측소의 관측값과 비교하여 정확성을 검증하였고, 수치모의 소요시간은 CPU(Central Processing Unit) 기반 운동파모형의 수치모의 소요시간과 비교하여 효율성을 검증하였다. GPU 가속 운동파모형의 수치모의 결과는 관측값과 유사한 결과를 나타냈으며, 수치모의 소요시간은 본 연구에 사용된 장비를 기준으로 최대 100배 정도 단축되었다.

  • PDF

A Study on comparison of calculation between CPU-intensive and GPU-intensive and finding proper model for specific program (GPU기반의 계산속도와 CPU기반의 계산속도 비교 및 특정 프로그램에 따른 적합한 모델 찾기에 대한 연구)

  • Shin, Hyun-Soo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.48-51
    • /
    • 2019
  • 최근 기술이 발달함으로 인해 더 짧은시간에 더 많은 계산량이 필요해진 시대가 왔다. 본 연구에서는 CPU와 GPU의 구조를 파악하고 계산속도를 비교한다. 직렬 방식의 알고리즘에서의 병렬 방식의 알고리즘 및 현재 GPU 병렬처리 적용 사례 및 추후 적합한 모델 찾기에 대해 연구한다.

A study on application of GPU-accelerated kinematic wave rainfall-runoff model (GPU 가속 운동파 강우유출모형의 적용 연구)

  • Kim, Boram;Yun, Gwan Seon;Kim, Hyeong-Jun;Yoon, Kwang Seok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.323-323
    • /
    • 2020
  • 그래픽 처리 장치(Graphic Processing Unit: GPU)는 그래픽 처리 작업에 특화된 다수의 산술논리 장치(Arithmetic Logic Unit: ALU)로 구성되어 있어서 중앙 처리 장치(Central Processing Unit: CPU)보다 한 번에 더 많은 연산 수행이 가능하다. 본 연구는 GPU 가속 운동파모형을 실제 유역에 적용하여, GPU 가속 운동파 강우유출모형 결과에 대한 정확성과 연산 소요 시간에 대한 효율성을 확인하였다. GPU 가속 운동파모형은 분포형 강우유출모형의 수치모의 연산시간을 단축시키기 위해 CUDA 포트란을 이용하여 개발되었다. 분포형모형의 지배방정식은 운동파모형과 Green-Ampt모형으로 구성되었고, 운동파모형은 유한체적법을 이용하여 이산화 하였다. GPU 가속 운동파모형을 이용하여 금강의 미호천 유역에서 발생하는 강우유출현상을 모의 하였고, 동일한 유한체적법을 이용한 CPU(Central Processing Unit) 기반의 강우유출모형과 비교하였다. 그 결과 GPU 가속모형의 결과는 미호천 유역 하류단에서 관측한 결과와 유사한 결과를 나타냈다. 또한, 연산소요시간은 CPU 기반의 강우유출모형의 연산소요시간보다 단축되었으며, 본 연구에 사용된 장비를 기준으로 최대 100배 정도 단축되었다.

  • PDF

EFFICIENT COMPUTATION OF COMPRESSIBLE FLOW BY HIGHER-ORDER METHOD ACCELERATED USING GPU (고차 정확도 수치기법의 GPU 계산을 통한 효율적인 압축성 유동 해석)

  • Chang, T.K.;Park, J.S.;Kim, C.
    • Journal of computational fluids engineering
    • /
    • v.19 no.3
    • /
    • pp.52-61
    • /
    • 2014
  • The present paper deals with the efficient computation of higher-order CFD methods for compressible flow using graphics processing units (GPU). The higher-order CFD methods, such as discontinuous Galerkin (DG) methods and correction procedure via reconstruction (CPR) methods, can realize arbitrary higher-order accuracy with compact stencil on unstructured mesh. However, they require much more computational costs compared to the widely used finite volume methods (FVM). Graphics processing unit, consisting of hundreds or thousands small cores, is apt to massive parallel computations of compressible flow based on the higher-order CFD methods and can reduce computational time greatly. Higher-order multi-dimensional limiting process (MLP) is applied for the robust control of numerical oscillations around shock discontinuity and implemented efficiently on GPU. The program is written and optimized in CUDA library offered from NVIDIA. The whole algorithms are implemented to guarantee accurate and efficient computations for parallel programming on shared-memory model of GPU. The extensive numerical experiments validates that the GPU successfully accelerates computing compressible flow using higher-order method.

A Tool for On-the-fly Repairing of Atomicity Violation in GPU Program Execution

  • Lee, Keonpyo;Lee, Seongjin;Jun, Yong-Kee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.9
    • /
    • pp.1-12
    • /
    • 2021
  • In this paper, we propose a tool called ARCAV (Atomatic Recovery of CUDA Atomicity violation) to automatically repair atomicity violations in GPU (Graphics Processing Unit) program. ARCAV monitors information of every barrier and memory to make actual memory writes occur at the end of the barrier region or to make the program execute barrier region again. Existing methods do not repair atomicity violations but only detect the atomicity violations in GPU programs because GPU programs generally do not support lock and sleep instructions which are necessary for repairing the atomicity violations. Proposed ARCAV is designed for GPU execution model. ARCAV detects and repairs four patterns of atomicity violations which represent real-world cases. Moreover, ARCAV is independent of memory hierarchy and thread configuration. Our experiments show that the performance of ARCAV is stable regardless of the number of threads or blocks. The overhead of ARCAV is evaluated using four real-world kernels, and its slowdown is 2.1x, in average, of native execution time.

A Reconfigurable Lighting Engine for Mobile GPU Shaders

  • Ahn, Jonghun;Choi, Seongrim;Nam, Byeong-Gyu
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.1
    • /
    • pp.145-149
    • /
    • 2015
  • A reconfigurable lighting engine for widely used lighting models is proposed for low-power GPU shaders. Conventionally, lighting operations that involve many complex arithmetic operations were calculated by the shader programs on the GPU, which led to a significant energy overhead. In this letter, we propose a lighting engine to improve the energy-efficiency by supporting the widely used advanced lighting models in hardware. It supports the Blinn-Phong, Oren-Nayar, and Cook-Torrance models, by exploiting the logarithmic arithmetic and optimizing the trigonometric function evaluations for the energy-efficiency. Experimental results demonstrate 12.7%, 42.5%, and 35.5% reductions in terms of power-delay product from the shader program implementations for each lighting model. Moreover, our work shows 10.1% higher energy-efficiency for the Blinn-Phong model compared to the prior art.

GPU-accelerated Lattice Boltzmann Simulation for the Prediction of Oil Slick Movement in Ocean Environment (GPU 가속 기술을 이용한 격자 볼츠만법 기반 원유 확산 과정 시뮬레이션)

  • Ha, Sol;Ku, Namkug;Roh, Myung-Il
    • Korean Journal of Computational Design and Engineering
    • /
    • v.18 no.6
    • /
    • pp.399-406
    • /
    • 2013
  • This paper describes a new simulation technique for advection-diffusion phenomena over the sea surface using the lattice Boltzmann method (LBM), capable of predicting oil dispersion from tankers. The LBM is used to solve the pollutant transport problem within the framework of the ocean environment. The sea space is represented by the lattices, where each lattice has the information on oil transportation. Since dispersed oils (i.e., oil droplets) at sea are transported by convection due to waves, buoyancy, and turbulent diffusion, the conservation of mass and many physical oil transport rules were used in the prediction model. Since the LBM is modeled using the uniform lattices and simple rules, it can be easily accelerated by the parallel mechanism, for example, GPU-accelerated method. The proposed model using the LBM is used to simulate a simple pollution event with the oil pollutants of 10,000 kL. The simulation results indicate that the LBM method accelerated with the GPU is 6 times faster than that without the GPU.

Analysis tool for the diffusion model using GPU: SNUDM-G (GPU를 이용한 확산모형 분석 도구: SNUDM-G)

  • Lee, Dajung;Lee, Hyosun;Koh, Sungryong
    • Korean Journal of Cognitive Science
    • /
    • v.33 no.3
    • /
    • pp.155-168
    • /
    • 2022
  • In this paper, we introduce the SNUDM-G, a diffusion model analysis tool with improved computational speed. Although the diffusion model has been applied to explain various cognitive tasks, its use was limited due to computational difficulties. In particular, SNUDM(Koh et al., 2020), one of the diffusion model analysis tools, has a disadvantage in terms of processing speed because it sequentially generates 20,000 data when approximating the diffusion process. To overcome this limitation, we propose to use graphic processing units(GPU) in the process of approximating the diffusion process with a random walk process. Since 20,000 data can be generated in parallel using the graphic processing units, the estimation speed can be increased compared to generating data through sequential processing. As a result of analyzing the data of Experiment 1 by Ratcliff et al. (2004) and recovering the parameters with SNUDM-G using GPU and SNUDM using CPU, SNUDM-G estimated slightly higher values for certain parameters than SNUDM. However, in term of computational speed, SNUDM-G estimated the parameters much faster than SNUDM. This result shows that a more efficient diffusion model analysis for various cognitive tasks is possible using this tool and further suggests that the processing speed of various cognitive models can be improved by using graphic processing units in the future.

GPU-based Stereo Matching Algorithm with the Strategy of Population-based Incremental Learning

  • Nie, Dong-Hu;Han, Kyu-Phil;Lee, Heng-Suk
    • Journal of Information Processing Systems
    • /
    • v.5 no.2
    • /
    • pp.105-116
    • /
    • 2009
  • To solve the general problems surrounding the application of genetic algorithms in stereo matching, two measures are proposed. Firstly, the strategy of simplified population-based incremental learning (PBIL) is adopted to reduce the problems with memory consumption and search inefficiency, and a scheme for controlling the distance of neighbors for disparity smoothness is inserted to obtain a wide-area consistency of disparities. In addition, an alternative version of the proposed algorithm, without the use of a probability vector, is also presented for simpler set-ups. Secondly, programmable graphics-hardware (GPU) consists of multiple multi-processors and has a powerful parallelism which can perform operations in parallel at low cost. Therefore, in order to decrease the running time further, a model of the proposed algorithm, which can be run on programmable graphics-hardware (GPU), is presented for the first time. The algorithms are implemented on the CPU as well as on the GPU and are evaluated by experiments. The experimental results show that the proposed algorithm offers better performance than traditional BMA methods with a deliberate relaxation and its modified version in terms of both running speed and stability. The comparison of computation times for the algorithm both on the GPU and the CPU shows that the former has more speed-up than the latter, the bigger the image size is.