• Title/Summary/Keyword: gpu

Search Result 964, Processing Time 0.041 seconds

Analyzing delay of Kernel function owing to GPU memory input from multiple VMs in RPC-based GPU virtualization environments (RPC 기반 GPU 가상화 환경에서 다중 가상머신의 GPU 메모리 입력으로 인한 커널 함수의 지연 문제 분석)

  • Kang, Jihun;Kim, Soo Kyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.541-542
    • /
    • 2021
  • 클라우드 컴퓨팅 환경에서는 고성능 컴퓨팅을 지원하기 위해 사용자에게 GPU(Graphic Processing Unit)가 할당된 가상머신을 제공하여 사용자가 고성능 응용을 실행할 수 있도록 지원한다. 일반적인 컴퓨팅 환경에서 한 명의 사용자가 GPU를 독점해서 사용하기 때문에 자원 경쟁으로 인한 문제가 상대적으로 적게 발생하지만 독립적인 여러 사용자가 컴퓨팅 자원을 공유하는 클라우드 환경에서는 자원 경쟁으로 인해 서로 성능 영향을 미치는 문제를 발생시킨다. 본 논문에서는 여러 개의 가상머신이 단일 GPU를 공유하는 RPC(Remote Procedure Call) 기반 GPU 가상화 환경에서 다수의 가상머신이 GPGPU(General Purpose computing on Graphics Processing Units) 작업을 수행할 때 GPU 메모리 입력 경쟁으로 인해 발생하는 커널 함수의 실행 지연 문제를 분석한다.

  • PDF

Efficient Implementation of Cryptography on GPU and GPU Resistance of Cryptography (GPU 상에서의 최적화 암호 구현과 암호의 GPU 내성)

  • Seo, Hwa-Jeong;Kwon, Hyeok-Dong;Kim, Hyun-Jun;Eum, Si-Woo;Sim, Min-Joo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.263-266
    • /
    • 2021
  • GPU 상에서의 효율적인 암호 구현을 위해서는 GPU 내부의 자원인 메모리와 명령어셋을 구현하고자 하는 암호 구조에 맞추어 사용하는 것이 중요하다. 본 논문에서는 Blowfish와 RC4를 최신 GPU 프로세서 상에서 최적 구현해 보고 성능을 저하시키는 요인들과 향상시키는 요인들을 비교 분석한다. 특히 파일암호화와 패스워드 크래킹에서 발생하는 암호화 구현 고려 사항에 대해 확인하며 해당 특징이 GPU 암호 구현 상에서 미치는 영향에 대해 확인해 보도록 한다. 마지막으로 앞에서 구현한 결과물의 성능을 저하시키는 요소에 대한 분석을 기반으로하여 높은 GPU 내성을 가지는 암호 설계를 위해 필요한 구조에 대해 확인해 보도록 한다.

Implementation of GPU System for SDR in WiBro Environment (WiBro 환경에서 SDR을 위한 GPU 시스템 구현)

  • Ahn, Sung-Soo;Lee, Jung-Suk
    • 전자공학회논문지 IE
    • /
    • v.48 no.3
    • /
    • pp.20-25
    • /
    • 2011
  • We developed a method of accelerating the operation speed of communication systems for SDR(Software Defined Radio) systems in WiBro environment. In this paper, we propose a new scheme of using GPU(Graphics Processing Unit) for implementing the communication system which perform with the functionality of SDR. In general, communication systems is made by DSP(Digital Signalling Processor) or FPGA(Field Programmable Gate Array). However, in this case, there are exist the problem of implementation and debugging caused by each CPU characteristic. The GPU is optimized for vector processing because it usually consists of multiple processors and each processor in GPU is composed of a set of threads. We also developed Framework to use GPU and CPU resources effectively for reducing the operation time. From the various simulation, it is confirmed that GPU system have good performance in WiBro system.

Implementation of Efficient Power Method on CUDA GPU (CUDA 기반 GPU에서 효율적인 Power Method의 구현)

  • Kim, Jung-Hwan;Kim, Jin-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.9-16
    • /
    • 2011
  • GPU computing is emerging in high performance application area since it can easily exploit massive parallelism in a way of cost-effective computing. The power method which finds the eigen vector of a given matrix is widely used in various applications such as PageRank for calculating importance of web pages. In this research we made the power method efficiently parallelized on GPU and also suggested how it can be improved to enhance its performance. The power method mainly consists of matrix-vector product and it can be easily parallelized. However, it should decide the convergence of the eigen vector and need scaling of the vector subsequently. Such operations incur several calls to GPU kernels and data movement between host and GPU memories. We improved the performance of the power method by means of reduced calls to GPU kernels, optimized thread allocation and enhanced decision operation for the convergence.

Parallel Range Query Processing with R-tree on Multi-GPUs (다중 GPU를 이용한 R-tree의 병렬 범위 질의 처리 기법)

  • Ryu, Hongsu;Kim, Mincheol;Choi, Wonik
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.522-529
    • /
    • 2015
  • Ever since the R-tree was proposed to index multi-dimensional data, many efforts have been made to improve its query performances. One common trend to improve query performance is to parallelize query processing with the use of multi-core architectures. To this end, a GPU-base R-tree has been recently proposed. However, even though a GPU-based R-tree can exhibit an improvement in query performance, it is limited in its ability to handle large volumes of data because GPUs have limited physical memory. To address this problem, we propose MGR-tree (Multi-GPU R-tree), which can manage large volumes of data by dividing nodes into multiple GPUs. Our experiments show that MGR-tree is up to 9.1 times faster than a sequential search on a GPU and up to 1.6 times faster than a conventional GPU-based R-tree.

Fast and Efficient Implementation of Neural Networks using CUDA and OpenMP (CUDA와 OPenMP를 이용한 빠르고 효율적인 신경망 구현)

  • Park, An-Jin;Jang, Hong-Hoon;Jung, Kee-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.253-260
    • /
    • 2009
  • Many algorithms for computer vision and pattern recognition have recently been implemented on GPU (graphic processing unit) for faster computational times. However, the implementation has two problems. First, the programmer should master the fundamentals of the graphics shading languages that require the prior knowledge on computer graphics. Second, in a job that needs much cooperation between CPU and GPU, which is usual in image processing and pattern recognition contrary to the graphic area, CPU should generate raw feature data for GPU processing as much as possible to effectively utilize GPU performance. This paper proposes more quick and efficient implementation of neural networks on both GPU and multi-core CPU. We use CUDA (compute unified device architecture) that can be easily programmed due to its simple C language-like style instead of GPU to solve the first problem. Moreover, OpenMP (Open Multi-Processing) is used to concurrently process multiple data with single instruction on multi-core CPU, which results in effectively utilizing the memories of GPU. In the experiments, we implemented neural networks-based text extraction system using the proposed architecture, and the computational times showed about 15 times faster than implementation on only GPU without OpenMP.

An Efficient Technique for Processing of Spatial Data Using GPU (GPU를 사용한 효율적인 공간 데이터 처리)

  • Lee, Jae-Il;Oh, Byoung-Woo
    • Spatial Information Research
    • /
    • v.17 no.3
    • /
    • pp.371-379
    • /
    • 2009
  • Recently, GPU (Graphics Processing Unit) has been improved rapidly on the need of speed for gaming. As a result, GPU contains multiple ALU (Arithmetic Logic Unit) for parallel processing of a lot of graphics data, such as transform, ray tracing, etc. Therefore, this paper proposed a technique for parallel processing of spatial data using GPU. Spatial data consists of multiple coordinates, and each coordinate contains value of x and y axis. To display spatial data graphics operations have to be processed to large amount of coordinates. Because the graphics operation is identical and coordinates are multiple data, SIMD (Single Instruction Multiple Data) parallel processing of GPU can be used for processing of spatial data to improve performance. This paper implemented SIMD parallel processing of spatial data using two kinds of SDK (Software Development Kit). CUDA and ATI Stream are used for NVIDIA and ATI GPU respectively. Experiments that measure time of calculation for graphics operations are carried out to observe enhancement of performance. Experimental result is reported that proposed method can enhance performance up to 1,162% for graphics operations. The proposed method that uses parallel processing with GPU for spatial data can be generally used to enhance performance for applications which deal with large amount of spatial data.

  • PDF

GPU-Based Parallel Collision Detection for Deformable Objects (변형 물체를 위한 GPU 기반 병렬 충돌 감지)

  • Sung, Nak-Jun;Kim, Min Sang;Hong, Min;Choi, Yoo-Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.1
    • /
    • pp.25-32
    • /
    • 2018
  • Due to heavy computational cost, deformable object simulation requires more effective collision detection method than rigid body simulation. However, when the CPU-based collision detection algorithm is purely applied to the GPU environment, the collision detection algorithm and the data structure optimized for the GPU environment are essential because the performance of the GPU can not be used properly. Therefore, we propose a GPU-based parallel collision detection algorithm for mass-spring system which is widely used for deformable object representation in this paper. The proposed method uses a parallel algorithm and data structure to reduce collision detection cost through GPU-based curling algorithm using AABB-Octree structure. In this paper, we prove the effectiveness of the proposed method by comparing the intersection test of all triangle pairs in parallel. The results of experimental tests show that the proposed method improves the performance by about 24% on average. Therefore, it is expected that the proposed method can improve the performance of real-time simulation for deformable objects.

GPU Acceleration of Range Doppler Algorithm for Real-Time SAR Image Generation (실시간 SAR 영상 생성을 위한 Range Doppler Algorithm의 GPU 가속)

  • Dong-Min Jeong;Woo-Kyung Lee;Myeong-Jin Lee;Yun-Ho Jung
    • Journal of IKEEE
    • /
    • v.27 no.3
    • /
    • pp.265-272
    • /
    • 2023
  • In this paper, a GPU-accelerated kernel of range Doppler algorithm (RDA) was developed for real-time image formation based on frequency modulated continuous wave (FMCW) synthetic aperture radar (SAR). A pinned memory was used to minimize the data transfer time between the host and the GPU device, and the kernel was configured to perform all RDA operations on the GPU to minimize the number of data transfers. The dataset was obtained through the FMCW drone SAR experiment, and the GPU acceleration effect was measured in an intel i7-9700K CPU, 32GB RAM, and Nvidia RTX 3090 GPU environment. Including the data transfer time between host and devices, it was measured to be accelerated up to 3.41 times compared to the CPU, and when only the acceleration effect of operation was measured without including the data transfer time, it was confirmed that it could be accelerated up to 156 times.

Thread Distribution Method of GP-GPU for Accelerating Parallel Algorithms (병렬 알고리즘의 가속화를 위한 GP-GPU의 Thread할당 기법)

  • Lee, Kwan-Ho;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.21 no.1
    • /
    • pp.92-95
    • /
    • 2017
  • In this paper, we proposed a way to improve function of small scale GP-GPU. Instead of using superscalar which increase scheduling-complexity, we suggested the application of simple core to maximize GP-GPU performance. Our studies also demonstrated that simplified Stream Processor is one of the way to achieve functional improvement in GP-GPU. In addition, we found that developing of optimal thread-assigning method in Warp Scheduler for specific application improves functional performance of GP-GPU. For examination of GP-GPU functional performance, we suggested the thread-assigning way which coordinated with Deep-Learning system; a part of Neural Network. As a result, we found that functional index in algorithm of Neural Network was increased to 90%, 98% compared with Intel CPU and ARM cortex-A15 4 core respectively.