• Title/Summary/Keyword: GPU Computing

Search Result 228, Processing Time 0.031 seconds

Efficient Parallel Block-layered Nonbinary Quasi-cyclic Low-density Parity-check Decoding on a GPU

  • Thi, Huyen Pham;Lee, Hanho
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.3
    • /
    • pp.210-219
    • /
    • 2017
  • This paper proposes a modified min-max algorithm (MMMA) for nonbinary quasi-cyclic low-density parity-check (NB-QC-LDPC) codes and an efficient parallel block-layered decoder architecture corresponding to the algorithm on a graphics processing unit (GPU) platform. The algorithm removes multiplications over the Galois field (GF) in the merger step to reduce decoding latency without any performance loss. The decoding implementation on a GPU for NB-QC-LDPC codes achieves improvements in both flexibility and scalability. To perform the decoding on the GPU, data and memory structures suitable for parallel computing are designed. The implementation results for NB-QC-LDPC codes over GF(32) and GF(64) demonstrate that the parallel block-layered decoding on a GPU accelerates the decoding process to provide a faster decoding runtime, and obtains a higher coding gain under a low $10^{-10}$ bit error rate and low $10^{-7}$ frame error rate, compared to existing methods.

Analysis of Implementing Mobile Heterogeneous Computing for Image Sequence Processing

  • BAEK, Aram;LEE, Kangwoon;KIM, Jae-Gon;CHOI, Haechul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.4948-4967
    • /
    • 2017
  • On mobile devices, image sequences are widely used for multimedia applications such as computer vision, video enhancement, and augmented reality. However, the real-time processing of mobile devices is still a challenge because of constraints and demands for higher resolution images. Recently, heterogeneous computing methods that utilize both a central processing unit (CPU) and a graphics processing unit (GPU) have been researched to accelerate the image sequence processing. This paper deals with various optimizing techniques such as parallel processing by the CPU and GPU, distributed processing on the CPU, frame buffer object, and double buffering for parallel and/or distributed tasks. Using the optimizing techniques both individually and combined, several heterogeneous computing structures were implemented and their effectiveness were analyzed. The experimental results show that the heterogeneous computing facilitates executions up to 3.5 times faster than CPU-only processing.

CPU-GPU2 Trigeneous Computing for Iterative Reconstruction in Computed Tomography

  • Oh, Chanyoung;Yi, Youngmin
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.4
    • /
    • pp.294-301
    • /
    • 2016
  • In this paper, we present methods to efficiently parallelize iterative 3D image reconstruction by exploiting trigeneous devices (three different types of device) at the same time: a CPU, an integrated GPU, and a discrete GPU. We first present a technique that exploits single instruction multiple data (SIMD) architectures in GPUs. Then, we propose a performance estimation model, based on which we can easily find the optimal data partitioning on trigeneous devices. We found that the performance significantly varies by up to 6.23 times, depending on how SIMD units in GPUs are accessed. Then, by using trigeneous devices and the proposed estimation models, we achieve optimal partitioning and throughput, which corresponds to a 9.4% further improvement, compared to discrete GPU-only execution.

The Optimization Mechanism of CPU/GPU Computing Resource for Minimization of Performance Interference and Calculation Efficiency in Volunteer Computing Environment (볼런티어 컴퓨팅 환경에서 성능간섭 최소화와 연산 효율성 증대를 위한 CPU/GPU 컴퓨팅 자원 최적화 기법)

  • Bak, Bong Woo;Song, Chung Geon;Yu, Heon Chang
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.12
    • /
    • pp.479-486
    • /
    • 2017
  • Volunteer computing is a new computing paradigm that performs operations on idle resources of many nodes. The operation method of the client application for the execution of the volunteer computing is determined by the setting information of the user. Ideal operation requires optimized settings for system features and operating methods of other applications. In this paper, we analyze the usage ratio of CPU and GPU periodically, and develop a manager that dynamically applies optimized options. Through our proposed mechanism, the performance of the task computing is higher than that of the existing Volunteer Computing, and the performance interference is minimized. It is expected that volunteers will be able to provide higher computing resources for Volunteer Computing Project.

The Need of Cache Partitioning on Shared Cache of Integrated Graphics Processor between CPU and GPU (내장형 GPU 환경에서 CPU-GPU 간의 공유 캐시에서의 캐시 분할 방식의 필요성)

  • Sung, Hanul;Eom, Hyeonsang;Yeom, HeonYoung
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.9
    • /
    • pp.507-512
    • /
    • 2014
  • Recently, Distributed computing processing begins using both CPU(Central processing unit) and GPU(Graphic processing unit) to improve the performance to overcome darksilicon problem which cannot use all of the transistors because of the electric power limitation. There is an integrated graphics processor that CPU and GPU share memory and Last level cache(LLC). But, There is no LLC access rules between CPU and GPU, so if GPU and CPU processes run together at the same time, performance of both processes gets worse because of the contention on the LLC. This Paper gives evidence to prove the need of the Cache Partitioning and is mentioned about the cache partitioning design using page coloring to allocate the L3 Cache space only for the GPU process to guarantee GPU process performance.

EFFICIENT COMPUTATION OF COMPRESSIBLE FLOW BY HIGHER-ORDER METHOD ACCELERATED USING GPU (고차 정확도 수치기법의 GPU 계산을 통한 효율적인 압축성 유동 해석)

  • Chang, T.K.;Park, J.S.;Kim, C.
    • Journal of computational fluids engineering
    • /
    • v.19 no.3
    • /
    • pp.52-61
    • /
    • 2014
  • The present paper deals with the efficient computation of higher-order CFD methods for compressible flow using graphics processing units (GPU). The higher-order CFD methods, such as discontinuous Galerkin (DG) methods and correction procedure via reconstruction (CPR) methods, can realize arbitrary higher-order accuracy with compact stencil on unstructured mesh. However, they require much more computational costs compared to the widely used finite volume methods (FVM). Graphics processing unit, consisting of hundreds or thousands small cores, is apt to massive parallel computations of compressible flow based on the higher-order CFD methods and can reduce computational time greatly. Higher-order multi-dimensional limiting process (MLP) is applied for the robust control of numerical oscillations around shock discontinuity and implemented efficiently on GPU. The program is written and optimized in CUDA library offered from NVIDIA. The whole algorithms are implemented to guarantee accurate and efficient computations for parallel programming on shared-memory model of GPU. The extensive numerical experiments validates that the GPU successfully accelerates computing compressible flow using higher-order method.

Introduction to general purpose GPU computing (GPU를 이용한 범용 계산의 소개)

  • Yu, Donghyeon;Lim, Johan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.1043-1061
    • /
    • 2013
  • Recent advances in computer technology introduce massive data and their analysis becomes important. The high performance computing is one of the most essential part in analysis of massive data. In this paper, we review the general purpose of the graphics processing unit and its application to parallel computing, which has been of great interest in statistics communities.

Performance Management Technique of Remote VR Service for Multiple Users in Container-Based Cloud Environments Sharing GPU (GPU를 공유하는 컨테이너 기반 클라우드 환경에서 다수의 사용자를 위한 원격 VR 서비스의 성능 관리 기법)

  • Kang, Jihun
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.1
    • /
    • pp.9-22
    • /
    • 2022
  • Virtual Reality(VR) technology is an interface technology that is actively used in various audio-visual-based applications by showing users a virtual world composed of computer graphics. Since VR-based applications are graphic processing-based applications, expensive computing devices equipped with Graphics Processing Unit(GPU) are essential for graphic processing. This incurs a cost burden on VR application users for maintaining and managing computing devices, and as one of the solutions to this, a method of operating services in cloud environments is being used. This paper proposes a performance management technique to address the problem of performance interference between containers owing to GPU resource competition in container-based high-performance cloud environments in which multiple containers share a single GPU. The proposed technique reduces performance deviation due to performance interference, helping provide uniform performance-based remote VR services for users. In addition, this paper verifies the efficiency of the proposed technique through experiments.

A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems (방출단층촬영 시스템을 위한 GPU 기반 반복적 기댓값 최대화 재구성 알고리즘 연구)

  • Ha, Woo-Seok;Kim, Soo-Mee;Park, Min-Jae;Lee, Dong-Soo;Lee, Jae-Sung
    • Nuclear Medicine and Molecular Imaging
    • /
    • v.43 no.5
    • /
    • pp.459-467
    • /
    • 2009
  • Purpose: The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Materials and Methods: Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. Results: The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 see, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 see, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. Conclusion: The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries.

Accelerating Distance Transform Image based Hand Detection using CPU-GPU Heterogeneous Computing

  • Yi, Zhaohua;Hu, Xiaoqi;Kim, Eung Kyeu;Kim, Kyung Ki;Jang, Byunghyun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.5
    • /
    • pp.557-563
    • /
    • 2016
  • Most of the existing hand detection methods rely on the contour shape of hand after skin color segmentation. Such contour shape based computations, however, are not only susceptible to noise and other skin color segments but also inherently sequential and difficult to efficiently parallelize. In this paper, we implement and accelerate our in-house distance image based approach using CPU-GPU heterogeneous computing. Using emerging CPU-GPU heterogeneous computing technology, we achieved 5.0 times speed-up for $320{\times}240$ images, and 17.5 times for $640{\times}480$ images and our experiment demonstrates that our proposed distance image based hand detection is robust and fast, reaching up to 97.32% palm detection rate, 80.4% of which have more than 3 fingers detected on commodity processors.