• Title/Summary/Keyword: GPGPU(GPGPU)

Search Result 200, Processing Time 0.025 seconds

Performance Enhancement and Evaluation of AES Cryptography using OpenCL on Embedded GPGPU (OpenCL을 이용한 임베디드 GPGPU환경에서의 AES 암호화 성능 개선과 평가)

  • Lee, Minhak;Kang, Woochul
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.7
    • /
    • pp.303-309
    • /
    • 2016
  • Recently, an increasing number of embedded processors such as ARM Mali begin to support GPGPU programming frameworks, such as OpenCL. Thus, GPGPU technologies that have been used in PC and server environments are beginning to be applied to the embedded systems. However, many embedded systems have different architectural characteristics compare to traditional PCs and low-power consumption and real-time performance are also important performance metrics in these systems. In this paper, we implement a parallel AES cryptographic algorithm for a modern embedded GPU using OpenCL, a standard parallel computing framework, and compare performance against various baselines. Experimental results show that the parallel GPU AES implementation can reduce the response time by about 1/150 and the energy consumption by approximately 1/290 compare to OpenMP implementation when 1000KB input data is applied. Furthermore, an additional 100 % performance improvement of the parallel AES algorithm was achieved by exploiting the characteristics of embedded GPUs such as removing copying data between GPU and host memory. Our results also demonstrate that higher performance improvement can be achieved with larger size of input data.

Parallel Rabin Fingerprinting on GPGPU for Efficient Data Deduplication (효율적인 데이터 중복제거를 위한 GPGPU 병렬 라빈 핑거프린팅)

  • Ma, Jeonghyeon;Park, Sejin;Park, Chanik
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.611-616
    • /
    • 2014
  • Rabin fingerprinting used for chunking requires the largest amount computation time in data deduplication, In this paper, therefore, we proposed parallel Rabin fingerprinting on GPGPU for efficient data deduplication. In addition, for efficient parallelism in Rabin fingerprinting, four issues are considered. Firstly, when dividing input data stream into data sections, we consider the data located near the boundaries between data sections to calculate Rabin fingerprint continuously. Secondly, we consider exploiting the characteristics of Rabin fingerprinting for efficient operation. Thirdly, we consider the chunk boundaries which can be changed compared to sequential Rabin fingerprinting when adapting parallel Rabin fingerprinting. Finally, we consider optimizing GPGPU memory access. Parallel Rabin fingerprinting on GPGPU shows 16 times and 5.3 times better performance compared to sequential Rabin fingerprinting on CPU and compared to parallel Rabin fingerprinting on CPU, respectively. These throughput improvement of Rabin fingerprinting can lead to total performance improvement of data deduplication.

An Implementation of a Video-Equipped Real-Time Fire Detection Algorithm Using GPGPU (GPGPU를 이용한 비디오 기반 실시간 화재감지 알고리즘 구현)

  • Shon, Dong-Koo;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.8
    • /
    • pp.1-10
    • /
    • 2014
  • This paper proposes a parallel implementation of the video based 4-stage fire detection algorithm using a general-purpose graphics processing unit (GPGPU) to support real-time processing of the high computational algorithm. In addition, this paper compares the performance of the GPGPU based fire detection implementation with that of the CPU implementation to show the effectiveness of the proposed method. Experimental results using five fire included videos with an SXGA ($1400{\times}1050$) resolution, the proposed GPGPU implementation achieves 6.6x better performance that the CPU implementation, showing 30.53ms per frame which satisfies real-time processing (30 frames per second, 30fps) of the fire detection algorithm.

GPGPU Task Management Technique to Mitigate Performance Degradation of Virtual Machines due to GPU Operation in Cloud Environments (클라우드 환경에서 GPU 연산으로 인한 가상머신의 성능 저하를 완화하는 GPGPU 작업 관리 기법)

  • Kang, Jihun;Gil, Joon-Min
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.9
    • /
    • pp.189-196
    • /
    • 2020
  • Recently, GPU cloud computing technology applying GPU(Graphics Processing Unit) devices to virtual machines is widely used in the cloud environment. In a cloud environment, GPU devices assigned to virtual machines can perform operations faster than CPUs through massively parallel processing, which can provide many benefits when operating high-performance computing services in a variety of fields in a cloud environment. In a cloud environment, a GPU device can help improve the performance of a virtual machine, but the virtual machine scheduler, which is based on the CPU usage time of a virtual machine, does not take into account GPU device usage time, affecting the performance of other virtual machines. In this paper, we test and analyze the performance degradation of other virtual machines due to the virtual machine that performs GPGPU(General-Purpose computing on Graphics Processing Units) task in the direct path based GPU virtualization environment, which is often used when assigning GPUs to virtual machines in cloud environments. Then to solve this problem, we propose a GPGPU task management method for a virtual machine.

Scalable Ontology Reasoning Using GPU Cluster Approach (GPU 클러스터 기반 대용량 온톨로지 추론)

  • Hong, JinYung;Jeon, MyungJoong;Park, YoungTack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.61-70
    • /
    • 2016
  • In recent years, there has been a need for techniques for large-scale ontology inference in order to infer new knowledge from existing knowledge at a high speed, and for a diversity of semantic services. With the recent advances in distributed computing, developments of ontology inference engines have mostly been studied based on Hadoop or Spark frameworks on large clusters. Parallel programming techniques using GPGPU, which utilizes many cores when compared with CPU, is also used for ontology inference. In this paper, by combining the advantages of both techniques, we propose a new method for reasoning large RDFS ontology data using a Spark in-memory framework and inferencing distributed data at a high speed using GPGPU. Using GPGPU, ontology reasoning over high-capacity data can be performed as a low cost with higher efficiency over conventional inference methods. In addition, we show that GPGPU can reduce the data workload on each node through the Spark cluster. In order to evaluate our approach, we used LUBM ranging from 10 to 120. Our experimental results showed that our proposed reasoning engine performs 7 times faster than a conventional approach which uses a Spark in-memory inference engine.

Thread Block Scheduling for Multi-Workload Environments in GPGPU (다중 워크로드 환경을 위한 GPGPU 스레드 블록 스케줄링)

  • Park, Soyeon;Cho, Kyung-Woon;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.2
    • /
    • pp.71-76
    • /
    • 2022
  • Round-robin is widely used for the scheduling of large-scale parallel workloads in the computing units of GPGPU. Round-robin is easy to implement by sequentially allocating tasks to each computing unit, but the load balance between computing units is not well achieved in multi-workload environments like cloud. In this paper, we propose a new thread block scheduling policy to resolve this situation. The proposed policy manages thread blocks generated by various GPGPU workloads with multiple queues based on their computation loads and tries to maximize the resource utilization of each computing unit by selecting a thread block from the queue that can maximally utilize the remaining resources, thereby inducing load balance between computing units. Through simulation experiments under various load environments, we show that the proposed policy improves the GPGPU performance by 24.8% on average compared to Round-robin.

Survey on GPGPU Programing Models (GPGPU 프로그래밍 모델의 기술 동향)

  • Lee, Hyunjin;Jeong, Yuna;Lee, Sungkil
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.389-391
    • /
    • 2013
  • 대용량 영상 데이터 처리를 위한 GPU 는 많은 코어들을 이용한 병렬 작업을 통해 결과를 도출한다. 단순 수치 연산에 특화된 이러한 GPU 의 계산 능력을 다른 분야로 확장시켜 적용하고자 하는 시도인 GPGPU 는 이전부터 꾸준히 시도되고 있다. 그러나 GPU 의 난해하고 생소한 프로그래밍으로, 작성이 쉽지 않고 현격한 성능 향상을 기대하기 어렵다. 이에, 이러한 GPGPU 프로그래밍의 어려움을 해결하고자 여러 프로그래밍 모델들이 등장하였다. 본 논문에서는 GPGPU 프로그래밍을 위한 대표적인 모델인 CUDA, OpenCL, C++ AMP, 그리고 OpenACC 에 대해 살펴본다.

A Method for accelerating training of Convolutional Neural Network (합성곱 신경망의 학습 가속화를 위한 방법)

  • Choi, Se Jin;Jung, Jun Mo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.3 no.4
    • /
    • pp.171-175
    • /
    • 2017
  • Recently, Training of the convolutional neural network (CNN) entails many iterative computations. Therefore, a method of accelerating the training speed through parallel processing using the hardware specifications of GPGPU is actively researched. In this paper, the operations of the feature extraction unit and the classification unit are divided into blocks and threads of GPGPU and processed in parallel. Convolution and Pooling operations of the feature extraction unit are processed in parallel at once without sequentially processing. As a result, proposed method improved the training time about 314%.

Implementation of high performance parallel LU factorization program for multi-threads on GPGPUs (GPGPU의 멀티 쓰레드를 활용한 고성능 병렬 LU 분해 프로그램의 구현)

  • Shin, Bong-Hi;Kim, Young-Tae
    • Journal of Internet Computing and Services
    • /
    • v.12 no.3
    • /
    • pp.131-137
    • /
    • 2011
  • GPUs were originally designed for graphic processing, and GPGPUs are general-purpose GPUs for numerical computation with high performance and low electric power. In this paper, we implemented the parallel LU factorization program for GPGPUs. In CUDA, which is computational environment for Nvidia GPGPUs, domains are divided into blocks, and multi-threads compute each sub-blocks Simultaneously. In LU factorization program, computation order should be artificially decided due to the data dependence. To resolve the data dependancy, we suggested a parallel LU program for GPGPUs, and also explained parallel reduction algorithm for partial pivoting of LU factorization. We finally present performance analysis to show efficiency of the parallel LU factorization program based on multi-threads on GPGPUs.