• Title/Summary/Keyword: GPU programming

Search Result 60, Processing Time 0.03 seconds

Implementation of GPU based MPEG-2 Decoder (GPU 기반의 MPEG-2 디코더의 구현)

  • Kim, Kyung-Su;Kim, Hong-Sik;Kim, Cheong-Ghil;Park, Woo-Chan
    • Journal of Digital Contents Society
    • /
    • v.9 no.3
    • /
    • pp.371-377
    • /
    • 2008
  • Recently the performance of GPU is increasing much faster compared to GPU and GPU is used for various application programs. In this paper, MPEG-2 Decoder is implemented based on a GPU programming language, CG. The proposed methodology is to perform block rendering with texture data according to video standard with very high parallelism by using the pipeline of GPU which is a stream processing structure. To reduce the data bandwidth between system memory and GPU, local memory is used for graphic card. According to the experiment, the proposed scheme shows performance improvement by more than 2 times compared to CPU based scheme.

  • PDF

Optimization Technique for Vertex Programming on Programmable GPU (프로그래밍이 가능한 GPU 상에서의 버텍스 프로그래밍의 최적화 기법)

  • Oh, Jinsang;Ihm, Insung
    • Journal of the Korea Computer Graphics Society
    • /
    • v.8 no.3
    • /
    • pp.25-34
    • /
    • 2002
  • 최근 프로그래밍이 가능한 그래픽스 프로세서(GPU)의 등장은 렌더링 속도의 향상은 물론 기존의 GPU가 할 수 없었던 다양한 그래픽스 계산을 효과적으로 수행할 수 있도록 해주고 있다. 이로 인하여 기존에 CPU 상에서 수행해야만 했던 그래픽스 계산들의 일부를 GPU 상에서 수행하도록 해주는 기법들에 대한 연구가 활발히 진행되고 있다. 본 논문에서는 선형식에 기반을 둔 여러 응용 문제들을 GPU 상에서 효율적으로 구현할 수 있도록 도와주는 쉐이더 코드 최적화 기법을 제안한다. 이 기법은 SIMD 형태의 병렬 처리 능력을 가진 버텍스 쉐이더의 명령어에 맞게 고안되었다. 본 기법의 활용 가능성을 보이기 위하여 미분 방정식을 풀기 위한 4차 런지-쿠타 방법, 선형방정식을 풀기 위한 가우스-자이델 방법, 자연스러운 유체 모델링을 위한 파동 방정식 등의 문제에 적용하여 보았다. 본 논문에서 제안한 최적화 기법은 버텍스 쉐이더 용 컴파일러 구현에 쓰일 수 있으며, 향후 프로그래밍이 가능한 GPU 상에서의 실시간 그래픽스 소프트웨어 개발에 유용하게 사용될 수 있을 것이다.

  • PDF

Implementation of $2{\times}2$ MIMO LTE Base Station using GPU for SDR System (GPU를 이용한 SDR 시스템 용 LTE MIMO 기지국 기능 구현)

  • Lee, Seung Hak;Kim, Kyung Hoon;Ahn, Chi Young;Choi, Seung Won
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.8 no.4
    • /
    • pp.91-98
    • /
    • 2012
  • This paper implements 2X2 MIMO Long Term Evolution (LTE) base station using Software defined radio (SDR) technology. The implemented base station system processes baseband signals on a Graphics Processor Unit(GPU). GPU is a high-speed parallel processor which provides very important advantage of using a very powerful C-based programming environment that is Compute Unified Device Architecture (CUDA). The implemented software-based base station system processes baseband signals through GPU. It utilizes USRP2 as its RF transceiver. In order to guarantee a real-time processing of LTE baseband signals, we have adopted well-known signal processing algorithms such as frame synchronization algorithms, ML detection, etc. using GPU operating in parallel processing.

Accelerating 2D DCT in Multi-core and Many-core Environments (멀티코어와 매니코어 환경에서의 2 차원 DCT 가속)

  • Hong, Jin-Gun;Jung, Sung-Wook;Kim, Cheong-Ghil;Burgstaller, Bernd
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.250-253
    • /
    • 2011
  • Chip manufacture nowadays turned their attention from accelerating uniprocessors to integrating multiple cores on a chip. Moreover desktop graphic hardware is now starting to support general purpose computation. Desktop users are able to use multi-core CPU and GPU as a high performance computing resources these days. However exploiting parallel computing resources are still challenging because of lack of higher programming abstraction for parallel programming. The 2-dimensional discrete cosine transform (2D-DCT) algorithms are most computational intensive part of JPEG encoding. There are many fast 2D-DCT algorithms already studied. We implemented several algorithms and estimated its runtime on multi-core CPU and GPU environments. Experiments show that data parallelism can be fully exploited on CPU and GPU architecture. We expect parallelized DCT bring performance benefit towards its applications such as JPEG and MPEG.

Extending Caffe for Machine Learning of Large Neural Networks Distributed on GPUs (대규모 신경회로망 분산 GPU 기계 학습을 위한 Caffe 확장)

  • Oh, Jong-soo;Lee, Dongho
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.4
    • /
    • pp.99-102
    • /
    • 2018
  • Caffe is a neural net learning software which is widely used in academic researches. The GPU memory capacity is one of the most important aspects of designing neural net architectures. For example, many object detection systems require to use less than 12GB to fit a single GPU. In this paper, we extended Caffe to allow to use more than 12GB GPU memory. To verify the effectiveness of the extended software, we executed some training experiments to determine the learning efficiency of the object detection neural net software using a PC with three GPUs.

A Comparison among Methods using CUDA Programming (CUDA 프로그래밍 기법 비교 연구)

  • Ihm, Sun-Young;Park, Young-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.138-139
    • /
    • 2013
  • GPU 를 활용하는 병렬 프로그래밍에 대한 관심이 높아지면서 이에 대한 연구가 활발히 진행되고 있다. GPU 의 성능이 높아지면서 이를 일반 연산에 사용하는 방법으로 NVIDIA 사에서 CUDA 프로그래밍 개발 환경을 제공하고 있다. 본 논문에서는 이 CUDA 프로그래밍 기법을 소개하고, 간단한 예제를 통해 CPU 와 GPU 를 사용하는 방법을 비교한다.

Optimization of Lightweight Encryption Algorithm (LEA) using Threads and Shared Memory of GPU (GPU의 스레드와 공유메모리를 이용한 LEA 최적화 방안)

  • Park, Moo Kyu;Yoon, Ji Won
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.4
    • /
    • pp.719-726
    • /
    • 2015
  • As big-data and cloud security technologies become popular, many researchers have recently been conducted on faster and lighter encryption. As a result, National Security Research Institute developed LEA which is lightweight and fast block cipher. To date, there have been various studies on lightweight encryption algorithm (LEA) for speeding up using GPU rather than conventional CPU. However, it is rather difficult to explore any guideline how to manipulate the GPU for the efficient usage of the LEA. Therefore, we introduce a guideline which explains how to implement and design the optimal LEA using GPU.

Hybrid parallel programming for Heterogeneous Multi-core performance optimization (헤테로지니어스 멀티코어 성능 최적화를 위한 하이브리드 병렬 프로그래밍)

  • Lim, Ju-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.7-9
    • /
    • 2012
  • CPU는 싱글 코어 구조에서 클록 속도를 높여 성능을 향상 시키려는 노력을 해왔으나 한계에 도달하자 하나의 칩에 코어를 여러 개 둔 멀티코어 형태로 발전하였다. CPU의 성능 향상을 위해 이제는 3D그래픽을 연산처리하기 위해 만들어진 GPU와 결합하기에 이르렀다. CPU와 GPU의 결합은 CPU간의 결합보다 훨씬 더 좋은 성능을 보였고 전력의 사용량도 더 적었으며 비용면에서도 경제적이라는 장점을 가지고 있다. 본 논문에서는 CPU와 GPU의 Heterogeneous multicore상에서 성능을 최적화하기 위해 기존의 병렬화 모델을 조합하고 최적화를 시도하였다. CPU상에서는 성능 향상을 위해 기존의 병렬 프로그램 모델인 SIMD와 공유메모리 병렬 프로그래밍 모델 그리고 메시지 패싱 병렬 프로그래밍 모델을 조합하는 실험을 했다. GPU에서는 CUDA를 최적화 하였다. 이렇게 CPU와 GPU를 최적화하고 조합하여 고성능 연산을 요구하는 어플리케이션을 위한 Heterogeneous multicore 성능 최적화 방법을 제안한다.

GP-GPU based Parallelization for Urban Terrain Atmospheric Model CFD_NIMR (도시기상모델 CFD_NIMR의 GP-GPU 실행을 위한 병렬 프로그램의 구현)

  • Kim, Youngtae;Park, Hyeja;Choi, Young-Jeen
    • Journal of Internet Computing and Services
    • /
    • v.15 no.2
    • /
    • pp.41-47
    • /
    • 2014
  • In this paper, we implemented a CUDA Fortran parallel program to run the CFD_NIMR model on GP-GPU's, which simulates air diffusion on urban terrains. A GP-GPU is graphic processing unit in the form of a PCI card, and a general calculation accelerator to perform a large amount of high speed calculations with low cost and electric power. The GP-GPU gives performance enhancement of speed by 15 times to compare the Nvidia Tesla C1060 GPU with Intel XEON 2.0 GHz CPU. In addition, the program on a GP-GPU shows efficient performance compared to an MPI parallel program on multiple CPU's. It is expected that a proposed programming method on the GP-GPU parallel program can be used for numerical models with a similar structure.

CUDA-based Object Oriented Programming Techniques for Efficient Parallel Visualization of 3D Content (3차원 콘텐츠의 효율적인 병렬 시각화를 위한 CUDA 환경 기반 객체 지향 프로그래밍 기법)

  • Park, Tae-Jung
    • Journal of Digital Contents Society
    • /
    • v.13 no.2
    • /
    • pp.169-176
    • /
    • 2012
  • This paper presents a parallel object-oriented programming (OOP) platform for efficient visualization of three-dimensional content in CUDA environments. For this purpose, this paper discusses the features and limitations in implementing C++ object-oriented codes using CUDA and proposes the solutions. Also, it presents how to implement a 3D parallel visualization platform based on the MVC (Model/View/Controller) design pattern. Also, it provides sample implementations for integral MLS (iMLS) and signed distance fields (SDFs) based on the Marching Cubes and Raytracing. The proposed approach enables GPU parallel processing only by implementing simple interfaces. Based on this, developers can expect general benefits that are common in general OOP techniques including abstractization and inheritance. Though I implemented only two specific samples in this paper, I expect my approach can be widely applied to general computer graphics problems.