• Title/Summary/Keyword: graphic processing unit(GPU)

Search Result 81, Processing Time 0.031 seconds

Development of Diffusive Wave Rainfall-Runoff Model Based on CUDA FORTRAN (CUDA FORTEAN기반 확산파 강우유출모형 개발)

  • Kim, Boram;Kim, Hyeong-Jun;Yoon, Kwang Seok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.287-287
    • /
    • 2021
  • 본 연구에서는 CUDA(Compute Unified Device Architecture) 포트란을 이용하여 확산파 강우 유출모형을 개발하였다. CUDA 포트란은 그래픽 처리 장치(Graphic Processing Unit: GPU)에서 수행하는 병렬 연산 알고리즘을 포트란 언어를 사용하여 작성할 수 있도록 하는 GPU상의 범용계산(General-Purpose Computing on Graphics Processing Units: GPGPU) 기술이다. GPU는 그래픽 처리 작업에 특화된 다수의 산술 논리 장치(Arithmetic Logic Unit: ALU)로 구성되어 있어서 중앙 처리 장치(Central Processing Unit: CPU)보다 한 번에 더 많은 연산 수행이 가능하다. 이에 따라, CUDA 포트란기반 확산파모형은 분포형 강우유출모형의 수치모의 연산시간을 단축시킬 수 있다. 분포형모형의 지배방정식은 확산파모형과 Green-Ampt모형으로 구성되었고, 확산파모형은 유한체적법을 이용하여 이산화 하였다. CUDA 포트란기반 확산파모형의 정확성은 기존 연구된 수리실험 결과 및 CPU기반 강우유출모형과 비교하였으며, 연산소요시간에 대한 효율성은 CPU기반 확산파모형과 비교하였다. 그 결과 CUDA 포트란기반 확산파모형의 결과는 수리실험 결과 및 CPU기반 강우유출모형의 결과와 유사한 결과를 나타냈다. 또한, 연산소요시간은 CPU 기반 확산파모형의 연산소요시간보다 단축되었으며, 본 연구에 사용된 장비를 기준으로 최대 100배 정도 단축되었다.

  • PDF

The Study of LDPC for Railroad Signal control system by Using GPU (GPU를 이용한 철도신호에서의 LDPC 적용에 관한 연구)

  • Park, Joo-Yul;Kim, Hyo-Sang;Kim, Jae-Moon;KIm, Bong-Taek;Chung, Ki-Seok
    • Proceedings of the KSR Conference
    • /
    • 2010.06a
    • /
    • pp.1075-1080
    • /
    • 2010
  • There have been lots of researches for High Performance Digital Signal Processing performance enhancement on a GPU(Graphic Processor Unit). These kinds of parallelizing can enable massive signal processing, so we can have advantage's of processing various of signal processing standards with GPU. In this paper we introduce Low Density Parity Check(LDPC) which is one of the Foward Error Correction(FEC). And we have achieved computational time reduce by using CUDA as a parallelizing scheme.

  • PDF

An Analytical Model for Performance Prediction of AES on GPU Architecture (GPU 아키텍처의 AES 암호화 성능 예측 분석 모델)

  • Kim, Kyuwoon;Kim, Hyunwoo;Kim, Huijeong;Huh, Taeyoung;Jung, Sanghyuk;Song, Yong Ho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.4
    • /
    • pp.89-96
    • /
    • 2013
  • The graphic processor unit (GPU) has been developed to process not only graphic data but also general system data. It shows a better performance than CPU in algorithm for 3D graphics and parallel program. In order to execute algorithm for CPU on GPU, we should understand about GPU architectures and rewrite program considering parallel processing capability and new memory model of GPU. For this reasons, a performance prediction model for the algorithm and its predicted performance through GPU system are required. These can predict problems in GPU application development or construct a performance evaluation standard for GPU. In this paper, we applied the AES encryption algorithms on our performance model and accomplished performance prediction with high accuracy under a heavy workload.

CUDA programming environment을 활용한 Path-Integral Monte Carlo Simulation의 구현

  • Lee, Hwa-Young;Im, Eun-Jin
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2009.05a
    • /
    • pp.196-199
    • /
    • 2009
  • 높아지는 Graphic Processing Unit (GPU)의 연산 성능과 GPU에서의 범용 프로그래밍을 위한 개발 환경의 개발, 보급으로 인해 GPU를 일반연산에 활용하는 연구가 활발히 진행되고 있다. 이와같이 일반 연산에 활용되고 있는 GPU로 nVidia Tesla와 AMD/ATI의 FireStream 들이 있다. 특수목적 연산 장치인 GPU를 일반 연산을 위해 프로그래밍하기 위해서는 그에 맞는 프로그램 개발 환경이 필요한데 nVidia에서 개발한 CUDA (Compute Unified Device Architecture) 환경은 자사의 GPU 프로그램 개발을 위해 제공되는 개발 환경이다. CUDA 개발 환경은 nVidia GPU 프로그래밍 뿐만 아니라 차세대 이종 병렬 프로그램 개발 환경의 공개 표준으로 논의되고 있는 OpenCL (Open Computing Language) 와 유사한 특징을 보일 것으로 예상되기 때문에 그 중요성은 특정 GPU 에만 국한되지 않는다. 본 논문에서는 경로 적분 몬테 카를로 (Path Integral Monte Carlo) 방법을 CUDA 개발 환경을 사용하여 nVidia GPU 상에서 병렬화한 결과를 제시하였다.

  • PDF

Fast Double Random Phase Encoding by Using Graphics Processing Unit (GPU 컴퓨팅에 의한 고속 Double Random Phase Encoding)

  • Saifullah, Saifullah;Moon, In-Kyu
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2012.05a
    • /
    • pp.343-344
    • /
    • 2012
  • With the increase of sensitive data and their secure transmission and storage, the use of encryption techniques has become widespread. The performance of encoding majorly depends on the computational time, so a system with less computational time suits more appropriate as compared to its contrary part. Double Random Phase Encoding (DRPE) is an algorithm with many sub functions which consumes more time when executed serially; the computation time can be significantly reduced by implementing important functions in a parallel fashion on Graphics Processing Unit (GPU). Computing convolution using Fast Fourier transform in DRPE is the most important part of the algorithm and it is shown in the paper that by performing this portion in GPU reduced the execution time of the process by substantial amount and can be compared with MATALB for performance analysis. NVIDIA graphic card GeForce 310 is used with CUDA C as a programming language.

  • PDF

A Realization of FPGA-based Image Recognition System (FPGA기반 영상인식 시스템 구현)

  • Young Yun
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.11a
    • /
    • pp.349-350
    • /
    • 2022
  • Recently, AI (Artificial Intelligence) has been applied to various technologies such as automatic driving, robot and smart communication. Currently, AI system is developed by software-based method using tensor flow, and GPU (Graphic Processing Unit) is employed for processing unit. In this work, we developed an FPGA-based (Field Programmable Gate Array) AI system , and report on image recognition system to realize the AI system.

  • PDF

Implementation of Parallel Computer Generated Hologram Using Multi-GPGPU (다중 GPGPU를 이용한 컴퓨터 생성 홀로그램의 병렬화 구현)

  • Seo, Young-Ho;Lee, Yoon-Hyuk;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.5
    • /
    • pp.1177-1186
    • /
    • 2014
  • Computer-generated hologram (CGH) is to mathematically model optical phenomenon with digital computer. Because it requires huge amount of computational power, a fast and high performance technique is needed. In this paper, we proposed two parallelizations for CGH calculation. The first is to parallelize CGH algorithm in a GPU (general processing unit) and the second is to parallelize multiple GPUs. The proposed algorithm was implemented in GTX780 Ti GPU. It calculates a $1,024{\times}1,024$ hologram with 10K object points for about 24ms.

A STUDY OF THE APPLICATION OF DELAUNAY GRID GENERATION ON GPU USING CUDA LIBRARY (GPU Library CUDA를 이용한 효율적인 Delaunay 격자 생성에 관한 연구)

  • Song, J.H.;Kang, S.H.;Kim, G.M.;Kim, B.S.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2011.05a
    • /
    • pp.194-198
    • /
    • 2011
  • In this study, an efficient algorithm for Delaunay triangulation of a number of points which can be used on a GPU-based parallel computation is studied The developed algorithm is programmed using CUDA library. and the program takes full advantage of parallel computation which are concurrently performed on each of the threads on GPU. The results of partitioned triangulation collected from the GPU computation requires proper stitching between neighboring partitions and calculation of connectivities among triangular cells on CPU In this study, the effect of number of threads on the efficiency and total duration for Delaunay grid generation is studied. And it is also shown that GPU computing using CUDA for Delaunay grid generation is feasible and it saves total time required for the triangulation of the large number points compared to the sequential CPU-based triangulation programs.

  • PDF

Fast and Efficient Implementation of Neural Networks using CUDA and OpenMP (CUDA와 OPenMP를 이용한 빠르고 효율적인 신경망 구현)

  • Park, An-Jin;Jang, Hong-Hoon;Jung, Kee-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.253-260
    • /
    • 2009
  • Many algorithms for computer vision and pattern recognition have recently been implemented on GPU (graphic processing unit) for faster computational times. However, the implementation has two problems. First, the programmer should master the fundamentals of the graphics shading languages that require the prior knowledge on computer graphics. Second, in a job that needs much cooperation between CPU and GPU, which is usual in image processing and pattern recognition contrary to the graphic area, CPU should generate raw feature data for GPU processing as much as possible to effectively utilize GPU performance. This paper proposes more quick and efficient implementation of neural networks on both GPU and multi-core CPU. We use CUDA (compute unified device architecture) that can be easily programmed due to its simple C language-like style instead of GPU to solve the first problem. Moreover, OpenMP (Open Multi-Processing) is used to concurrently process multiple data with single instruction on multi-core CPU, which results in effectively utilizing the memories of GPU. In the experiments, we implemented neural networks-based text extraction system using the proposed architecture, and the computational times showed about 15 times faster than implementation on only GPU without OpenMP.

Implementation of Particle Swarm Optimization Method Using CUDA (CUDA를 이용한 Particle Swarm Optimization 구현)

  • Kim, Jo-Hwan;Kim, Eun-Su;Kim, Jong-Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.5
    • /
    • pp.1019-1024
    • /
    • 2009
  • In this paper, particle swarm optimization(PSO) is newly implemented by CUDA(Compute Unified Device Architecture) and is applied to function optimization with several benchmark functions. CUDA is not CPU but GPU(Graphic Processing Unit) that resolves complex computing problems using parallel processing capacities. In addition, CUDA helps one to develop GPU softwares conveniently. Compared with the optimization result of PSO executed on a general CPU, CUDA saves about 38% of PSO running time as average, which implies that CUDA is a promising frame for real-time optimization and control.