• Title/Summary/Keyword: CUDA

Search Result 292, Processing Time 0.035 seconds

Optimization of Lightweight Encryption Algorithm (LEA) using Threads and Shared Memory of GPU (GPU의 스레드와 공유메모리를 이용한 LEA 최적화 방안)

  • Park, Moo Kyu;Yoon, Ji Won
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.4
    • /
    • pp.719-726
    • /
    • 2015
  • As big-data and cloud security technologies become popular, many researchers have recently been conducted on faster and lighter encryption. As a result, National Security Research Institute developed LEA which is lightweight and fast block cipher. To date, there have been various studies on lightweight encryption algorithm (LEA) for speeding up using GPU rather than conventional CPU. However, it is rather difficult to explore any guideline how to manipulate the GPU for the efficient usage of the LEA. Therefore, we introduce a guideline which explains how to implement and design the optimal LEA using GPU.

Analysis of the GPGPU Performance for Various Combinations of Workloads Executed Concurrently (동시에 실행되는 워크로드 조합에 따른 GPGPU 성능 분석)

  • Kim, Dongwhan;Eom, Hyeonsang
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.165-170
    • /
    • 2017
  • Many studies have utilized GPGPU (General-Purpose Graphic Processing Unit) and its high computing power to compute complex tasks. The characteristics of GPGPU programs necessitate the operations of memory copy between the host and device. A high latency period can affect the performance of the program. Thus, it is required to significantly improve the performance of GPGPU programs by optimizations. By executing multiple GPGPU programs simultaneously, the latency hiding effect of memory copy is achieved by overlapping the memory copy and computing operations in GPGPU. This paper presents the results of analyzing the latency hiding effect for memory copy operations. Furthermore, we propose a performance anticipation model and an algorithm for the limitations of using pinned memory, and show that the use of the proposed algorithm results in a 41% performance increase.

Warp-based Emotion-adaptive Real-Time Transforming Technique of Character's Facial Expression (워핑 기반의 감정 적응형 실시간 캐릭터 표정변환 기법)

  • Bae, Dong-Hee;Kim, Jin-Mo;Yun, Do-Kyung;Cho, Hyung-Je
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06a
    • /
    • pp.434-437
    • /
    • 2011
  • 최근 단일 프로세서의 성능 개선이 한계에 이르고, 이에 따라 데이터 병렬 처리를 통한 시스템 성능 개선에 관한 연구가 활발히 진행되고 있다. 또한 이러한 변화로 인해 영상처리 분야에서도 대규모 연산의 병렬 컴퓨팅 수행에 관한 연구가 꾸준히 진행되고 있으며 하드웨어 또한 발전하여 실시간 시스템에 영상처리 분야가 많이 활용되고 있다. 본 논문에서는 캐릭터의 감정 상태에 따른 표정을 영상처리 분야에서 많이 사용되고 있는 이미지 워핑 기법을 적용하여 변화시킨다. 인간이 표현할 수 있는 기본적인 감정에 따른 표정을 데이터베이스로 정리하여 캐릭터에게 임의의 감정값이 주어지면 그에 맞는 표정을 데이터베이스에서 선택하여 사용자가 설정한 프레임만큼 워핑을 수행한다. 하지만 매 프레임에 대해 정해져 있는 제어선에 따라 움직이는 픽셀들의 워핑 연산은 그 계산량이 너무 많아 실시간으로 처리하기에 여러 가지 제약이 뒤따른다. 따라서 이를 실시간으로 처리하기 위해 NVIDIA의 CUDA를 활용한 데이터 병렬처리를 수행하여 실시간 처리가 가능하게 하는 방법을 제안하고, 실험을 통해 그 유용성을 제시한다.

Approximating the Convex Hull for a Set of Spheres (구 집합에 대한 컨벡스헐 근사)

  • Kim, Byungjoo;Kim, Ku-Jin;Kim, Young J.
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.3 no.1
    • /
    • pp.1-6
    • /
    • 2014
  • Most of the previous algorithms focus on computing the convex hull for a set of points. In this paper, we present a method for approximating the convex hull for a set of spheres with various radii in discrete space. Computing the convex hull for a set of spheres is a base technology for many applications that study structural properties of molecules. We present a voxel map data structures, where the molecule is represented as a set of spheres, and corresponding algorithms. Based on CUDA programming for using the parallel architecture of GPU, our algorithm takes less than 40ms for computing the convex hull of 6,400 spheres in average.

A Study on GPGPU Performance Improvement Technique on GCN Architecture Using OpenCL API (GCN 아키텍쳐 상에서의 OpenCL을 이용한 GPGPU 성능향상 기법 연구)

  • Woo, DongHee;Kim, YoonHo
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.1
    • /
    • pp.37-45
    • /
    • 2018
  • The current system upon which a variety of programs are in operation has continuously expanded its domain from conventional single-core and multi-core system to many-core and heterogeneous system. However, existing researches have focused mostly on parallelizing programs based CUDA framework and rarely on AMD based GCN-GPU optimization. In light of the aforementioned problems, our study focuses on the optimization techniques of the GCN architecture in a GPGPU environment and achieves a performance improvement. Specifically, by using performance techniques we propose, we have reduced more then 30% of the computation time of matrix multiplication and convolution algorithm in GPGPU. Also, we increase the kernel throughput by more then 40%.

Fast Stereo matching based on Plane-converging Belief Propagation using GPU (Plane-converging Belief Propagation을 이용한 고속 스테레오매칭)

  • Jung, Young-Han;Park, Eun-Soo;Kim, Hak-Il;Huh, Uk-Youl
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.2
    • /
    • pp.88-95
    • /
    • 2011
  • Stereo matching is the research area that regarding the estimation of the distance between objects and camera using different view points and it still needs lot of improvements in aspects of speed and accuracy. This paper presents a fast stereo matching algorithm based on plane-converging belief propagation that uses message passing convergence in hierarchical belief propagation. Also, stereo matching technique is developed using GPU and it is available for real-time applications. The error rate of proposed Plane-converging Belief Propagation algorithm is similar to the conventional Hierarchical Belief Propagation algorithm, while speed-up factor reaches 2.7 times.

Parallel Computation for Extended Edit Distances Using the Shared Memory on GPU (GPU의 공유메모리를 활용한 확장편집거리 병렬계산)

  • Kim, Youngho;Na, Joong Chae;Sim, Jeong Seop
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.7
    • /
    • pp.213-218
    • /
    • 2015
  • Given two strings X and Y (|X|=m, |Y|=n) over an alphabet ${\Sigma}$, the extended edit distance between X and Y can be computed using dynamic programming in O(mn) time and space. Recently, a parallel algorithm that takes O(m+n) time and O(mn) space using m threads to compute the extended edit distance between X and Y was presented. In this paper, we present an improved parallel algorithm using the shared memory on GPU. The experimental results show that our parallel algorithm runs about 19~25 times faster than the previous parallel algorithm.

H.264/AVC Fast Intra Mode Decision using GPGPU Parallel Programming (GPGPU 병렬 프로그래밍을 이용한 H.264/AVC 고속 화면내 예측 모드 결정)

  • Choi, Sung-Jun;Han, Ki-Hun;Yoo, Yeong-Soo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.11a
    • /
    • pp.110-112
    • /
    • 2011
  • GPU의 병렬성과 연산능력을 일반적인 공학적 문제 해결에 적용하는 GPGPU 컴퓨팅에 대한 연구가 최근 활발히 진행되고 있다. 비디오 압축과정에는 많은 양의 화소 데이터에 동일하게 반복되는 연산을 수행하는 알고리즘이 많이 적용되므로 GPGPU를 통한 고속 병렬 계산의 응용 분야로 매우 적합하다. H.264/AVC는 비디오를 압축하는 가장 최신의 국제표준으로 여러 제품군과 서비스에 대한 적용되어 시장에서 널리 사용되고 있다. 본 논문에서는 GPGPU의 응용 분야로 주목 받고 있는 비디오 압축 분야에 대한 적용으로 H.264/AVC의 화면내 예측 모드 결정과정에 GPGPU 병렬 프로그래밍을 적용하여 예측 모드 결정 속도를 향상하는 방법을 제안한다. GPU상에서의 데이터 병렬처리를 위해 CUDA C언어를 사용하였으며, CPU상에서의 연산은 C언어를 사용하여 구현되었다. GPU상에서 프레임 전체에 대한 화면내 예측 모드를 병렬적으로 결정함으로써 이에 소요되는 시간을 줄여 줄 수 있었다. 실험결과 GPU상에서 병렬적으로 예측 모드를 결정할 때 Full-HD급 영상에서 약 2.8배 정도의 속도 향상을 확인할 수 있었다. 향후 GPGPU 병렬 프로그래밍을 화면 내 예측뿐만 아니라 반복되는 연산을 수행하는 다른 알고리즘에도 적용하여 부호화기의 계산 부담을 덜어준다면 고속 실시간 비디오 압축 부호기 개발이 더욱 용이해 질것으로 기대된다.

  • PDF

Analysis of Morton Code Conversion for 32 Bit IEEE 754 Floating Point Variables (IEEE 754 부동 소수점 32비트 float 변수의 Morton Code 변환 분석)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.17 no.3
    • /
    • pp.165-172
    • /
    • 2016
  • Morton codes play important roles in many parallel GPU applications for the nearest neighbor (NN) search in huge data and queries with its applications growing. This paper discusses and analyzes the meaning of Tero Karras's 32-bit 'unsigned int' Morton code algorithm for three-dimensional spatial information in $[0,1]^3$ and its geometric implications. Based on this, this paper proposes 64-bit 'unsigned long long' version of Morton code and compares the results in both CPU vs. GPU and 32-bit vs. 64-bit versions. The proposed GPU algorithm runs around 1000 times faster than the CPU version.

GPU-accelerated Lattice Boltzmann Simulation for the Prediction of Oil Slick Movement in Ocean Environment (GPU 가속 기술을 이용한 격자 볼츠만법 기반 원유 확산 과정 시뮬레이션)

  • Ha, Sol;Ku, Namkug;Roh, Myung-Il
    • Korean Journal of Computational Design and Engineering
    • /
    • v.18 no.6
    • /
    • pp.399-406
    • /
    • 2013
  • This paper describes a new simulation technique for advection-diffusion phenomena over the sea surface using the lattice Boltzmann method (LBM), capable of predicting oil dispersion from tankers. The LBM is used to solve the pollutant transport problem within the framework of the ocean environment. The sea space is represented by the lattices, where each lattice has the information on oil transportation. Since dispersed oils (i.e., oil droplets) at sea are transported by convection due to waves, buoyancy, and turbulent diffusion, the conservation of mass and many physical oil transport rules were used in the prediction model. Since the LBM is modeled using the uniform lattices and simple rules, it can be easily accelerated by the parallel mechanism, for example, GPU-accelerated method. The proposed model using the LBM is used to simulate a simple pollution event with the oil pollutants of 10,000 kL. The simulation results indicate that the LBM method accelerated with the GPU is 6 times faster than that without the GPU.