• Title/Summary/Keyword: CPU-GPU heterogeneous computing

Search Result 20, Processing Time 0.027 seconds

CPU-GPU2 Trigeneous Computing for Iterative Reconstruction in Computed Tomography

  • Oh, Chanyoung;Yi, Youngmin
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.4
    • /
    • pp.294-301
    • /
    • 2016
  • In this paper, we present methods to efficiently parallelize iterative 3D image reconstruction by exploiting trigeneous devices (three different types of device) at the same time: a CPU, an integrated GPU, and a discrete GPU. We first present a technique that exploits single instruction multiple data (SIMD) architectures in GPUs. Then, we propose a performance estimation model, based on which we can easily find the optimal data partitioning on trigeneous devices. We found that the performance significantly varies by up to 6.23 times, depending on how SIMD units in GPUs are accessed. Then, by using trigeneous devices and the proposed estimation models, we achieve optimal partitioning and throughput, which corresponds to a 9.4% further improvement, compared to discrete GPU-only execution.

A Simulation Framework for CUDA Computing on Non-x86 Platforms based on QEMU and GPGPU-Sim (비x86 플랫폼 상에서의 CUDA 컴퓨팅을 위한 QEMU 및 GPGPU-Sim 기반 시뮬레이션 프레임워크 개발)

  • Hwang, Jaemin;Choi, Jong-Wook;Choi, Seongrim;Nam, Byeong-Gyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.19 no.2
    • /
    • pp.15-22
    • /
    • 2014
  • This paper proposes a CUDA simulation framework for non-x86 computing platforms based on QEMU and GPGPU-sim. Previous simulators for heterogeneous computing platforms did not support for non-x86 CPU models or CUDA computing platform. In this work, we combined the QEMU and the GPGPU-Sim to support the non-x86 CPU models and the CUDA platform, respectively. This approach provides a simulation framework for CUDA computing on non-x86 CPU models.

NAAL: Software for controlling heterogeneous IoT devices based on neuromorphic architecture abstraction (NAAL: 뉴로모픽 아키텍처 추상화 기반 이기종 IoT 기기 제어용 소프트웨어)

  • Cho, Jinsung;Kim, Bongjae
    • Smart Media Journal
    • /
    • v.11 no.3
    • /
    • pp.18-25
    • /
    • 2022
  • Neuromorphic computing generally shows significantly better power, area, and speed performance than neural network computation using CPU and GPU. These characteristics are suitable for resource-constrained IoT environments where energy consumption is important. However, there is a problem in that it is necessary to modify the source code for environment setting and application operation according to heterogeneous IoT devices that support neuromorphic computing. To solve these problems, NAAL was proposed and implemented in this paper. NAAL provides functions necessary for IoT device control and neuromorphic architecture abstraction and inference model operation in various heterogeneous IoT device environments based on common APIs of NAAL. NAAL has the advantage of enabling additional support for new heterogeneous IoT devices and neuromorphic architectures and computing devices in the future.

A Dual Transcoding Method for Retaining QoS of Video Streaming Services under Restricted Computing Resources (동영상 스트리밍 서비스의 QoS유지를 위한 듀얼 트랜스코딩 기법)

  • Oh, Doohwan;Ro, Won Woo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.3 no.7
    • /
    • pp.231-240
    • /
    • 2014
  • Video transcoding techniques provide an efficient mechanism to make a video content adaptive to the capabilities of a variety of clients. However, it is hard to provide an appropriate quality-of-service(QoS) to the clients owing to heavy workload on transcoding operations. In light of this fact, this paper presents the dual transcoding method in order to guarantee QoS in streaming services by maximizing resource usage in a transcoding server equipped with both CPU and GPU computing units. The CPU and GPU computing units have different architectural features. The proposed method speculates workload of incoming transcoding requests and then schedules the requests either to the CPU or GPU accordingly. From performance evaluation, the proposed dual transcoding method achieved a speedup of 1.84 compared with traditional transcoding approach.

Parallel Processing Method on CPU for Image Processing on Mobile Heterogeneous Computing System (모바일 이기종 컴퓨팅 시스템에서 영상처리 고속화를 위한 CPU측 병렬처리 방법)

  • Beak, Aram;Choi, Haechul
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2015.07a
    • /
    • pp.181-182
    • /
    • 2015
  • 모바일 기기의 보급률과 성능이 급속도로 성장하면서 모바일 기기에서의 비디오 소비 또한 크게 증가하였다. 하지만, 전력과 공간을 줄이기 위해 설계된 모바일 플랫폼은 데스크톱 플랫폼과 비교하여 성능의 한계가 존재한다. 따라서 대용량 비디오 처리를 위해 SIMD 아키텍쳐를 이용하는 임베디드 GPU를 활용하여 이와 같은 한계를 극복하기 위한 고속화 연구가 많이 진행되고 있다. 저장된 데이터를 활용하는 영상처리는 GPU 뿐만 아니라 CPU가 반드시 함께 이용되어야 하며, 모바일 환경에서의 이기종 컴퓨팅 시스템은 프로세서 사이의 낮은 전송속도와 이로 인한 대기시간, 모바일 운영체제가 지원하는 데이터 형태의 필수적인 사용 등의 구조적 단점이 존재한다. 본 논문에서는 임베디드 GPU를 활용한 영상처리 고속화를 위해 임베디드 CPU측에서 병렬처리를 이용하여 앞서 설명한 단점들을 극복하고 실험결과로 모바일 이기종 컴퓨팅 구조에서 임베디드 CPU 활용이 전체적인 연산 효율을 증가시키는 결과를 보였다.

  • PDF

Implementation of Massive FDTD Simulation Computing Model Based on MPI Cluster for Semi-conductor Process (반도체 검증을 위한 MPI 기반 클러스터에서의 대용량 FDTD 시뮬레이션 연산환경 구축)

  • Lee, Seung-Il;Kim, Yeon-Il;Lee, Sang-Gil;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.9
    • /
    • pp.21-28
    • /
    • 2015
  • In the semi-conductor process, a simulation process is performed to detect defects by analyzing the behavior of the impurity through the physical quantity calculation of the inner element. In order to perform the simulation, Finite-Difference Time-Domain(FDTD) algorithm is used. The improvement of semiconductor which is composed of nanoscale elements, the size of simulation is getting bigger. Problems that a processor such as CPU or GPU cannot perform the simulation due to the massive size of matrix or a computer consist of multiple processors cannot handle a massive FDTD may come up. For those problems, studies are performed with parallel/distributed computing. However, in the past, only single type of processor was used. In GPU's case, it performs fast, but at the same time, it has limited memory. On the other hand, in CPU, it performs slower than that of GPU. To solve the problem, we implemented a computing model that can handle any FDTD simulation regardless of size on the cluster which consist of heterogeneous processors. We tested the simulation on processors using MPI libraries which is based on 'point to point' communication and verified that it operates correctly regardless of the number of node and type. Also, we analyzed the performance by measuring the total execution time and specific time for the simulation on each test.

Toward High Utilization of Heterogeneous Computing Resources in SNP Detection

  • Lim, Myungeun;Kim, Minho;Jung, Ho-Youl;Kim, Dae-Hee;Choi, Jae-Hun;Choi, Wan;Lee, Kyu-Chul
    • ETRI Journal
    • /
    • v.37 no.2
    • /
    • pp.212-221
    • /
    • 2015
  • As the amount of re-sequencing genome data grows, minimizing the execution time of an analysis is required. For this purpose, recent computing systems have been adopting both high-performance coprocessors and host processors. However, there are few applications that efficiently utilize these heterogeneous computing resources. This problem equally refers to the work of single nucleotide polymorphism (SNP) detection, which is one of the bottlenecks in genome data processing. In this paper, we propose a method for speeding up an SNP detection by enhancing the utilization of heterogeneous computing resources often used in recent high-performance computing systems. Through the measurement of workload in the detection procedure, we divide the SNP detection into several task groups suitable for each computing resource. These task groups are scheduled using a window overlapping method. As a result, we improved upon the speedup achieved by previous open source applications by a magnitude of 10.

iSSD-Based Collaborative Processing for Big Data Mining (효율적인 빅 데이터 마이닝을 위한 iSSD 기반 협업 처리 방안)

  • Jo, Yong-Yoen;Kim, Sang-Wook;Bae, Duck-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.460-470
    • /
    • 2017
  • We address how to handle big data mining effectively using the intelligent SSD (iSSD). ISSD is a storage device equipped with computing power inside SSD for reducing the transferring cost and for processing data nearby SSD where the data is stored. We first introduce the structural characteristics of iSSD for efficient data processing. Then, we present how to process data mining algorithms by using iSSD. Finally, we discuss how to improve the performance of data mining algorithms significantly by exploiting heterogeneous computing environment where host CPUs and GPU coexist for maximizing the performance.

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

  • Hong, Jung-Hyun;Park, Joo-Yul;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2648-2668
    • /
    • 2016
  • Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an OpenCL framework. The LDPC code is one of the most popular and strongest error correcting codes for mobile communication systems. Each step of LDPC decoding has different parallelization characteristics. In the proposed LDPC decoder, steps suitable for task-level parallelization are executed on the multi-core central processing unit (CPU), and steps suitable for data-level parallelization are processed by the graphics processing unit (GPU). To improve the performance of OpenCL kernels for LDPC decoding operations, explicit thread scheduling, vectorization, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance and high power efficiency by using heterogeneous multi-core processors on a unified computing framework.

Reevaluating the overhead of data preparation for asymmetric multicore system on graphics processing

  • Pei, Songwen;Zhang, Junge;Jiang, Linhua;Kim, Myoung-Seo;Gaudiot, Jean-Luc
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.7
    • /
    • pp.3231-3244
    • /
    • 2016
  • As processor design has been transiting from homogeneous multicore processor to heterogeneous multicore processor, traditional Amdahl's law cannot meet the new challenges for asymmetric multicore system. In order to further investigate the impact factors related to the Overhead of Data Preparation (ODP) for Asymmetric multicore systems, we evaluate an asymmetric multicore system built with CPU-GPU by measuring the overheads of memory transfer, computing kernel, cache missing and synchronization. This paper demonstrates that decreasing the overhead of data preparation is a promising approach to improve the whole performance of heterogeneous system.