• Title/Summary/Keyword: GPU implementation

Search Result 147, Processing Time 0.026 seconds

Trends in Implementation of Homomorphic Encryption using GPU (GPU를 활용한 동형암호 구현 동향)

  • Eum, Si-Woo;Kim, Hyun-Jun;Lim, Se-Jin;Seo, Hwa-Jeong
    • Annual Conference of KIPS
    • /
    • 2022.11a
    • /
    • pp.213-215
    • /
    • 2022
  • 빅데이터, 인공지능, 클라우드 등의 기술이 발전함에 따라서 개인 정보나 중요 데이터가 많이 노출되고 있다. 동형암호는 암호화된 데이터에 대해서 직접 연산이 가능한 암호체계이다. 이러한 특성은 오늘날 클라우드 컴퓨팅 플랫폼에 매우 중요한 기술이지만, 많은 연산으로 인해 처리 시간이 오래 걸려 많이 사용되어 오고 있지 않다, GPU는 병렬 연산의 특성을 활용하여 CPU가 담당하는 작업을 훨씬 효율적으로 작업하는 것이 가능하다. 본 논문에서는 GPU를 활용하여 동형 암호의 속도 향상을 위한 기법 연구 동향에 대해 알아본다.

High-Speed Implementations of Block Ciphers on Graphics Processing Units Using CUDA Library (GPU용 연산 라이브러리 CUDA를 이용한 블록암호 고속 구현)

  • Yeom, Yong-Jin;Cho, Yong-Kuk
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.3
    • /
    • pp.23-32
    • /
    • 2008
  • The computing power of graphics processing units(GPU) has already surpassed that of CPU and the gap between their powers is getting wider. Thus, research on GPGPU which applies GPU to general purpose becomes popular and shows great success especially in the field of parallel data processing. Since the implementation of cryptographic algorithm using GPU was started by Cook et at. in 2005, improved results using graphic libraries such as OpenGL and DirectX have been published. In this paper, we present skills and results of implementing block ciphers using CUDA library announced by NVIDIA in 2007. Also, we discuss a general method converting source codes of block ciphers on CPU to those on GPU. On NVIDIA 8800GTX GPU, the resulting speeds of block cipher AES, ARIA, and DES are 4.5Gbps, 7.0Gbps, and 2.8Gbps, respectively which are faster than the those on CPU.

Design and Implementation of High-Speed Software Cryptographic Modules Using GPU (GPU를 활용한 고속 소프트웨어 암호모듈 설계 및 구현)

  • Song, JinGyo;An, SangWoo;Seo, Seog Chung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1279-1289
    • /
    • 2020
  • To securely protect users' sensitive information and national secrets, the importance of cryptographic modules has been emphasized. Currently, many companies and national organizations are actively using cryptographic modules. In Korea, To ensure the security of these cryptographic modules, the cryptographic module has been verified through the Korea Certificate Module Validation Program(KCMVP). Most of the domestic cryptographic modules are CPU-based software (S/W). However, CPU-based cryptographic modules are difficult to use in servers that need to process large amounts of data. In this paper, we propose an S/W cryptographic module that provides a high-speed operation using GPU. We describe the configuration and operation of the S/W cryptographic module using GPU and present the changes in the cryptographic module security requirements by using GPU. In addition, we present the performance improvement compared to the existing CPU S/W cryptographic module. The results of this paper can be used for cryptographic modules that provide cryptography in servers that manage IoT (Internet of Things) or provide cloud computing.

Performance Enhancement and Evaluation of AES Cryptography using OpenCL on Embedded GPGPU (OpenCL을 이용한 임베디드 GPGPU환경에서의 AES 암호화 성능 개선과 평가)

  • Lee, Minhak;Kang, Woochul
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.7
    • /
    • pp.303-309
    • /
    • 2016
  • Recently, an increasing number of embedded processors such as ARM Mali begin to support GPGPU programming frameworks, such as OpenCL. Thus, GPGPU technologies that have been used in PC and server environments are beginning to be applied to the embedded systems. However, many embedded systems have different architectural characteristics compare to traditional PCs and low-power consumption and real-time performance are also important performance metrics in these systems. In this paper, we implement a parallel AES cryptographic algorithm for a modern embedded GPU using OpenCL, a standard parallel computing framework, and compare performance against various baselines. Experimental results show that the parallel GPU AES implementation can reduce the response time by about 1/150 and the energy consumption by approximately 1/290 compare to OpenMP implementation when 1000KB input data is applied. Furthermore, an additional 100 % performance improvement of the parallel AES algorithm was achieved by exploiting the characteristics of embedded GPUs such as removing copying data between GPU and host memory. Our results also demonstrate that higher performance improvement can be achieved with larger size of input data.

Implementation of Pedestrian Detection and Tracking with GPU at Night-time (GPU를 이용한 야간 보행자 검출과 추적 시스템 구현)

  • Choi, Beom-Joon;Yoon, Byung-Woo;Song, Jong-Kwan;Park, Jangsik
    • Journal of Broadcast Engineering
    • /
    • v.20 no.3
    • /
    • pp.421-429
    • /
    • 2015
  • This paper is about an approach for pedestrian detection and tracking with infrared imagery. We used the CUDA(Computer Unified Device Architecture) that is a parallel processing language in order to improve the speed of video-based pedestrian detection and tracking. The detection phase is performed by Adaboost algorithm based on Haar-like features. Adaboost classifier is trained with datasets generated from infrared images. After detecting the pedestrian with the Adaboost classifier, we proposed a particle filter tracking strategies on HSV histogram feature that exploit adaptively at the same time. The proposed approach is implemented on an NVIDIA Jetson TK1 developer board that is full-featured device ideal for software development within the Linux environment. In this paper, we presented the results of parallel processing with the NVIDIA GPU on the CUDA development environment for detection and tracking of pedestrians. We compared the object detection and tracking processing time for night-time images on both GPU and CPU. The result showed that the detection and tracking speed of the pedestrian with GPU is approximately 6 times faster than that for CPU.

Implementation of Real-time Interactive Ray Tracing on GPU (GPU 기반의 실시간 인터렉티브 광선추적법 구현)

  • Bae, Sung-Min;Hong, Hyun-Ki
    • Journal of Korea Game Society
    • /
    • v.7 no.3
    • /
    • pp.59-66
    • /
    • 2007
  • Ray tracing is one of the classical global illumination methods to generate a photo-realistic rendering image with various lighting effects such as reflection and refraction. However, there are some restrictions on real-time applications because of its computation load. In order to overcome these limitations, many researches of the ray tracing based on GPU (Graphics Processing Unit) have been presented up to now. In this paper, we implement the ray tracing algorithm by J. Purcell and combine it with two methods in order to improve the rendering performance for interactive applications. First, intersection points of the primary ray are determined efficiently using rasterization on graphics hardware. We then construct the acceleration structure of 3D objects to improve the rendering performance. There are few researches on a detail analysis of improved performance by these considerations in ray tracing rendering. We compare the rendering system with environment mapping based on GPU and implement the wireless remote rendering system. This system is useful for interactive applications such as the realtime composition, augmented reality and virtual reality.

  • PDF

Design of Omok AI using Genetic Algorithm and Game Trees and Their Parallel Processing on the GPU (유전 알고리즘과 게임 트리를 병합한 오목 인공지능 설계 및 GPU 기반 병렬 처리 기법)

  • Ahn, Il-Jun;Park, In-Kyu
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.2
    • /
    • pp.66-75
    • /
    • 2010
  • This paper proposes an efficient method for design and implementation of the artificial intelligence (AI) of 'omok' game on the GPU. The proposed AI is designed on a cooperative structure using min-max game tree and genetic algorithm. Since the evaluation function needs intensive computation but is independently performed on a lot of candidates in the solution space, it is computed on the GPU in a massive parallel way. The implementation on NVIDIA CUDA and the experimental results show that it outperforms significantly over the CPU, in which parallel game tree and genetic algorithm on the GPU runs more than 400 times and 300 times faster than on the CPU. In the proposed cooperative AI, selective search using genetic algorithm is performed subsequently after the full search using game tree to search the solution space more efficiently as well as to avoid the thread overflow. Experimental results show that the proposed algorithm enhances the AI significantly and makes it run within the time limit given by the game's rule.

Parallel Implementation of the Recursive Least Square for Hyperspectral Image Compression on GPUs

  • Li, Changguo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.7
    • /
    • pp.3543-3557
    • /
    • 2017
  • Compression is a very important technique for remotely sensed hyperspectral images. The lossless compression based on the recursive least square (RLS), which eliminates hyperspectral images' redundancy using both spatial and spectral correlations, is an extremely powerful tool for this purpose, but the relatively high computational complexity limits its application to time-critical scenarios. In order to improve the computational efficiency of the algorithm, we optimize its serial version and develop a new parallel implementation on graphics processing units (GPUs). Namely, an optimized recursive least square based on optimal number of prediction bands is introduced firstly. Then we use this approach as a case study to illustrate the advantages and potential challenges of applying GPU parallel optimization principles to the considered problem. The proposed parallel method properly exploits the low-level architecture of GPUs and has been carried out using the compute unified device architecture (CUDA). The GPU parallel implementation is compared with the serial implementation on CPU. Experimental results indicate remarkable acceleration factors and real-time performance, while retaining exactly the same bit rate with regard to the serial version of the compressor.

Implementation of Parallel Computer Generated Hologram Using Multi-GPGPU (다중 GPGPU를 이용한 컴퓨터 생성 홀로그램의 병렬화 구현)

  • Seo, Young-Ho;Lee, Yoon-Hyuk;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.5
    • /
    • pp.1177-1186
    • /
    • 2014
  • Computer-generated hologram (CGH) is to mathematically model optical phenomenon with digital computer. Because it requires huge amount of computational power, a fast and high performance technique is needed. In this paper, we proposed two parallelizations for CGH calculation. The first is to parallelize CGH algorithm in a GPU (general processing unit) and the second is to parallelize multiple GPUs. The proposed algorithm was implemented in GTX780 Ti GPU. It calculates a $1,024{\times}1,024$ hologram with 10K object points for about 24ms.

Real-Time Object Segmentation in Image Sequences (연속 영상 기반 실시간 객체 분할)

  • Kang, Eui-Seon;Yoo, Seung-Hun
    • The KIPS Transactions:PartB
    • /
    • v.18B no.4
    • /
    • pp.173-180
    • /
    • 2011
  • This paper shows an approach for real-time object segmentation on GPU (Graphics Processing Unit) using CUDA (Compute Unified Device Architecture). Recently, many applications that is monitoring system, motion analysis, object tracking or etc require real-time processing. It is not suitable for object segmentation to procedure real-time in CPU. NVIDIA provide CUDA platform for Parallel Processing for General Computation to upgrade limit of Hardware Graphic. In this paper, we use adaptive Gaussian Mixture Background Modeling in the step of object extraction and CCL(Connected Component Labeling) for classification. The speed of GPU and CPU is compared and evaluated with implementation in Core2 Quad processor with 2.4GHz.The GPU version achieved a speedup of 3x-4x over the CPU version.