• Title/Summary/Keyword: GPU 프로그램

Search Result 73, Processing Time 0.022 seconds

Analysis on the GPU Performance according to Hierarchical Memory Organization (계층적 메모리 구성에 따른 GPU 성능 분석)

  • Choi, Hongjun;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.3
    • /
    • pp.22-32
    • /
    • 2014
  • Recently, GPGPU has been widely used for general-purpose processing as well as graphics processing by providing optimized hardware for parallel processing. Memory system has big effects on the performance of parallel processing units such as GPU. In the GPU, hierarchical memory architecture is implemented for high memory bandwidth. Moreover, both memory address coalescing and memory request merging techniques are widely used. This paper analyzes the GPU performance according to various memory organizations. According to our simulation results, GPU performance improves by 15.5%, 21.5%, 25.5%, 30.9% as adding 8KB L1, 16KB L1, 32KB L1, 64KB L1 cache, respectively, compared to case without L1 cache. However, experimental results show that some benchmarks decrease performance since memory transaction increases due to data dependency. Moreover, average memory access latency is increased as the depth of hierarchical cache level increases when cache miss occurs significantly.

Fast and Efficient Implementation of Neural Networks using CUDA and OpenMP (CUDA와 OPenMP를 이용한 빠르고 효율적인 신경망 구현)

  • Park, An-Jin;Jang, Hong-Hoon;Jung, Kee-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.253-260
    • /
    • 2009
  • Many algorithms for computer vision and pattern recognition have recently been implemented on GPU (graphic processing unit) for faster computational times. However, the implementation has two problems. First, the programmer should master the fundamentals of the graphics shading languages that require the prior knowledge on computer graphics. Second, in a job that needs much cooperation between CPU and GPU, which is usual in image processing and pattern recognition contrary to the graphic area, CPU should generate raw feature data for GPU processing as much as possible to effectively utilize GPU performance. This paper proposes more quick and efficient implementation of neural networks on both GPU and multi-core CPU. We use CUDA (compute unified device architecture) that can be easily programmed due to its simple C language-like style instead of GPU to solve the first problem. Moreover, OpenMP (Open Multi-Processing) is used to concurrently process multiple data with single instruction on multi-core CPU, which results in effectively utilizing the memories of GPU. In the experiments, we implemented neural networks-based text extraction system using the proposed architecture, and the computational times showed about 15 times faster than implementation on only GPU without OpenMP.

GPU based Fast Recognition of Artificial Landmark for Mobile Robot (주행로봇을 위한 GPU 기반의 고속 인공표식 인식)

  • Kwon, Oh-Sung;Kim, Young-Kyun;Cho, Young-Wan;Seo, Ki-Sung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.5
    • /
    • pp.688-693
    • /
    • 2010
  • Vision based object recognition in mobile robots has many issues for image analysis problems with neighboring elements in dynamic environments. SURF(Speeded Up Robust Features) is the local feature extraction method of the image and its performance is constant even if disturbances, such as lighting, scale change and rotation, exist. However, it has a difficulty of real-time processing caused by representation of high dimensional vectors. To solve th problem, execution of SURF in GPU(Graphics Processing Unit) is proposed and implemented using CUDA of NVIDIA. Comparisons of recognition rates and processing time for SURF between CPU and GPU by variation of robot velocity and image sizes is experimented.

VDI Performance Optimization with Hybrid Parallel Processing in Thick Client System under Heterogeneous Multi-Core Environment (Heterogeneous 멀티 코어 환경의 Thick Client에서 VDI 성능 최적화를 위한 혼합 병렬 처리 기법 연구)

  • Kim, Myeong-Seob;Huh, Eui-Nam
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38B no.3
    • /
    • pp.163-171
    • /
    • 2013
  • Recently, the requirement of processing High Definition (HD) video or 3D application on low, mobile devices has been expanded and content data has been increased as well. It is becoming a major issue in Cloud computing where a Virtual Desktop Infrastructure (VDI) Service needs efficient data processing ability to provide Quality of Experience (QoE) in Cloud computing. In this paper, we propose three kind of Thick-Thin VDI Service which can share and delegate VDI service based on Thick Client using CPU and GPU. Furthermore, we propose and discuss the VDI Service Optimization Method in mixed CPU and GPU Heterogeneous Environment using CPU Parallel Processing OpenMP and GPU Parallel Processing CUDA.

Efficient Task Distribution for Pig Monitoring Applications Using OpenCL (OpenCL을 이용한 돈사 감시 응용의 효율적인 태스크 분배)

  • Kim, Jinseong;Choi, Younchang;Kim, Jaehak;Chung, Yeonwoo;Chung, Yongwha;Park, Daihee;Kim, Hakjae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.10
    • /
    • pp.407-414
    • /
    • 2017
  • Pig monitoring applications consisting of many tasks can take advantage of inherent data parallelism and enable parallel processing using performance accelerators. In this paper, we propose a task distribution method for pig monitoring applications into a heterogenous computing platform consisting of a multicore-CPU and a manycore-GPU. That is, a parallel program written in OpenCL is developed, and then the most suitable processor is determined based on the measured execution time of each task. The proposed method is simple but very effective, and can be applied to parallelize other applications consisting of many tasks on a heterogeneous computing platform consisting of a CPU and a GPU. Experimental results show that the performance of the proposed task distribution method on three different heterogeneous computing platforms can improve the performance of the typical GPU-only method where every tasks are executed on a deviceGPU by a factor of 1.5, 8.7 and 2.7, respectively.

High-Speed Implementations of Block Ciphers on Graphics Processing Units Using CUDA Library (GPU용 연산 라이브러리 CUDA를 이용한 블록암호 고속 구현)

  • Yeom, Yong-Jin;Cho, Yong-Kuk
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.3
    • /
    • pp.23-32
    • /
    • 2008
  • The computing power of graphics processing units(GPU) has already surpassed that of CPU and the gap between their powers is getting wider. Thus, research on GPGPU which applies GPU to general purpose becomes popular and shows great success especially in the field of parallel data processing. Since the implementation of cryptographic algorithm using GPU was started by Cook et at. in 2005, improved results using graphic libraries such as OpenGL and DirectX have been published. In this paper, we present skills and results of implementing block ciphers using CUDA library announced by NVIDIA in 2007. Also, we discuss a general method converting source codes of block ciphers on CPU to those on GPU. On NVIDIA 8800GTX GPU, the resulting speeds of block cipher AES, ARIA, and DES are 4.5Gbps, 7.0Gbps, and 2.8Gbps, respectively which are faster than the those on CPU.

Efficient Task Distribution of Pig Monitoring Application using OpenCL (OpenCL을 사용한 돈사 감시 응용의 효율적인 태스크 분배)

  • Kim, J.;Choi, Y.;Kim, J.;Chung, Y.;Chung, Y.;Park, D.;Kim, H.
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.54-57
    • /
    • 2017
  • 돈사 감시 응용은 내재된 데이터 병렬성을 활용하고 성능가속기를 사용하여 병렬처리가 가능하다. 본 논문에서는 multicore-CPU와 manycore-GPU로 구성된 이기종 컴퓨팅 환경에서 돈사 감시 응용 수행 시 태스크 분배 방법을 제안한다. 즉, 각 태스크별로 OpenCL로 작성된 병렬 프로그램을 deviceCPU와 deviceGPU 각각에서 수행시켜 측정된 수행시간을 기준으로 가장 적합한 처리기를 결정한다. 제안 방법은 간단하지만 매우 효과적이고, CPU와 GPU로 구성된 이기종 컴퓨팅 플랫폼에서 다른 응용을 병렬화하는데에도 적용될 수 있다. 실험 결과, 상이한 이기종 컴퓨팅 플랫폼에서 최적의 태스크 분배로 수행 한 경우가 전체 태스크들을 deviceGPU에서 수행한 방법에 비교하여 각각 2배, 11배 성능 개선이 되었음을 확인하였다.

Acceleration of 2D Image Based Flow Visualization using GPU (GPU를 이용한 2차원 영상 기반 유동 가시화 기법의 가속)

  • Lee, Joong-Youn
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.543-546
    • /
    • 2007
  • Flow visualization is one of visualization techniques and it means a visual expression of vector data using 2D or 3D graphics. It aims for human to easily find and understand a special feature of the vector data. The Image Based Flow Visualization (IBFV) is one of the fastest technique in the dense integration based flow visualization techniques. In this paper, IBFV is accelerated and implemented using commodity GPU. Especially, mesh advection is accelerated at the vertex program.

  • PDF

The Performance Analysis of GPU-based Cloth simulation according to the Change of Work Group Configuration (워크 그룹 구성 변화에 따른 GPU 기반 천 시뮬레이션의 성능 분석)

  • Choi, Young-Hwan;Hong, Min;Lee, Seung-Hyun;Choi, Yoo-Joo
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.29-36
    • /
    • 2017
  • In these days, 3D dynamic simulation is closely related to many industries. In the past, physically-based 3D simulation was used mainly in the car crash or construction related fields, but it also plays an important role in movies or games today. Many mathematical computations are needed to represent the 3D object realistically, but it is difficult to process a large amount of calculations for simulation of application based on CPU in real-time. Recently, with the advanced graphic hardware and improved architecture, GPU can be utilized for the general purposes of computation function as well as graphic computation. Many approaches using GPU have been applied for various research fields. In this paper, we analyze the performance variation of two cloth simulation algorithms based on GPU according to the change of execution properties of GPU shaders in oder to optimize the performance of GPU-based cloth simulation. Cloth simulation is implemented by the spring centric algorithm and node centric algorithm with GPU parallel computing using compute shader of GLSL 4.3. We compare the performance of between these algorithms according to the change of the size and dimension of work group. The experiment is repeated to 10 times during 5,000 frames for each test and experimental results are provided by averaging of FPS. The experimental result shows that the node centric algorithm is executed in higher speed than the spring centric algorithm.

Analysis of Programming Techniques for Creating Optimized CUDA Software (최적화된 CUDA 소프트웨어 제작을 위한 프로그래밍 기법 분석)

  • Kim, Sung-Soo;Kim, Dong-Heon;Woo, Sang-Kyu;Ihm, In-Sung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.7
    • /
    • pp.775-787
    • /
    • 2010
  • Unlike general-purpose CPUs, the GPUs have been specialized as many-core streaming processors, and are frequently replacing the CPUs in an increasing range of computations thanks to their outstanding parallel computing capacity. In order to respond to such trend, NVIDIA has recently issued a new parallel computing architecture called CUDA(Compute Unified Device Architecture), offering a flexible GPU programming environment for GPGPU(General Purpose GPU) computing. In general, when programmers use the CUDA API, they should clearly understand many aspects of GPU's computing architecture to produce efficient parallel software. In this article, we explain several optimization techniques for CUDA programming that we have verified through a lot of experiment and trial and error, and review how those techniques affect the performance of code execution. In particular, we use a specific problem as an example to analyze several elements that affect performances, such as effective accesses to hierarchical memory system, processor occupancy, and latency hiding. In conclusion, we present several directions that may be utilized effectively in CUDA-based parallel programming.