• 제목/요약/키워드: GPU 구현

Search Result 254, Processing Time 0.936 seconds

Efficient Implementation of Cryptography on GPU and GPU Resistance of Cryptography (GPU 상에서의 최적화 암호 구현과 암호의 GPU 내성)

  • Seo, Hwa-Jeong;Kwon, Hyeok-Dong;Kim, Hyun-Jun;Eum, Si-Woo;Sim, Min-Joo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.263-266
    • /
    • 2021
  • GPU 상에서의 효율적인 암호 구현을 위해서는 GPU 내부의 자원인 메모리와 명령어셋을 구현하고자 하는 암호 구조에 맞추어 사용하는 것이 중요하다. 본 논문에서는 Blowfish와 RC4를 최신 GPU 프로세서 상에서 최적 구현해 보고 성능을 저하시키는 요인들과 향상시키는 요인들을 비교 분석한다. 특히 파일암호화와 패스워드 크래킹에서 발생하는 암호화 구현 고려 사항에 대해 확인하며 해당 특징이 GPU 암호 구현 상에서 미치는 영향에 대해 확인해 보도록 한다. 마지막으로 앞에서 구현한 결과물의 성능을 저하시키는 요소에 대한 분석을 기반으로하여 높은 GPU 내성을 가지는 암호 설계를 위해 필요한 구조에 대해 확인해 보도록 한다.

Implementation of GPU System for SDR in WiBro Environment (WiBro 환경에서 SDR을 위한 GPU 시스템 구현)

  • Ahn, Sung-Soo;Lee, Jung-Suk
    • 전자공학회논문지 IE
    • /
    • v.48 no.3
    • /
    • pp.20-25
    • /
    • 2011
  • We developed a method of accelerating the operation speed of communication systems for SDR(Software Defined Radio) systems in WiBro environment. In this paper, we propose a new scheme of using GPU(Graphics Processing Unit) for implementing the communication system which perform with the functionality of SDR. In general, communication systems is made by DSP(Digital Signalling Processor) or FPGA(Field Programmable Gate Array). However, in this case, there are exist the problem of implementation and debugging caused by each CPU characteristic. The GPU is optimized for vector processing because it usually consists of multiple processors and each processor in GPU is composed of a set of threads. We also developed Framework to use GPU and CPU resources effectively for reducing the operation time. From the various simulation, it is confirmed that GPU system have good performance in WiBro system.

Optimal Implementation of Lightweight Block Cipher PIPO on CUDA GPGPU (CUDA GPGPU 상에서 경량 블록 암호 PIPO의 최적 구현)

  • Kim, Hyun-Jun;Eum, Si-Woo;Seo, Hwa-Jeong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.6
    • /
    • pp.1035-1043
    • /
    • 2022
  • With the spread of the Internet of Things (IoT), cloud computing, and big data, the need for high-speed encryption for applications is emerging. GPU optimization can be used to validate cryptographic analysis results or reduced versions theoretically obtained by the GPU in a reasonable time. In this paper, PIPO lightweight encryption implemented in various environments was implemented on GPU. Optimally implemented considering the brute force attack on PIPO. In particular, the optimization implementation applying the bit slicing technique and the GPU elements were used as much as possible. As a result, the implementation of the proposed method showed a throughput of about 19.5 billion per second in the RTX 3060 environment, achieving a throughput of about 122 times higher than that of the previous study.

Implementation of a 3D Graphics Simulator for GP-GPU (GP-GPU 개발을 위한 3차원 그래픽 시뮬레이터 구현)

  • Yeo, Dong-young;Kim, Woo-young;Jung, Hyung-Ki;Lee, Kwang-Yeob
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.337-340
    • /
    • 2009
  • Since a hardware accelerator for 3D graphics processing GPU(Graphics Processing Unit)'s performance has been improving constantly. This is the efficient way was introduced for complex graphics application, but it is rarely used to utilize 100% resources on GPU. GP-GPU(general-purpose GPU), including operations on the GPU and supporting common operations can be handled by the processor, is noted by depending on the distribution of resources that can be effectively controlled. In this paper, the simulator was implemented that supports virtual environment of GP-GPU and available for program design and debugging. Through this, the co-design development environment support simultaneous design fast and reliable verification that are available to build the interface of three-dimensional graphics display.

  • PDF

GPU for Multi-Layer Perceptron (다층 신경망 구현에서의 GPU 사용)

  • 정기철;오경수
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.736-738
    • /
    • 2004
  • 신경망의 테스트 단계를 실시간으로 처리하기 위해 많은 노력이 있었다 본 논문은 일반적인 그래픽스 하드웨어를 이용하여 더욱 빠른 신경망을 구현하고, 구현된 시스템을 영상 처리 분야에 적용함으로써 효용성을 검증한다. GPU는 CPU보다 병렬연산에 효과적이다. GPU의 병렬성을 효율적으로 사용하기 위하여, 다수의 신경망 입력벡터와 웨이트벡터를 모아서 많은 내적연산을 하나의 행렬곱 연산으로 대체하였고, 시그모이드와 바이어스 항 덧셈 연산도 픽셀세이더로 병렬 구현하였다. ATI RADEON 9800 XT 보드를 이용하여 구현된 신경망 시스템은 CPU를 사용한 기존의 시스템과 비교하여 정악도의 차이 없이 30배 정도의 속도 향상을 얻을 수 있었다.

  • PDF

Parallel Implementation of SPECK, SIMON and SIMECK by Using NVIDIA CUDA PTX (NVIDIA CUDA PTX를 활용한 SPECK, SIMON, SIMECK 병렬 구현)

  • Jang, Kyung-bae;Kim, Hyun-jun;Lim, Se-jin;Seo, Hwa-jeong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.423-431
    • /
    • 2021
  • SPECK and SIMON are lightweight block ciphers developed by NSA(National Security Agency), and SIMECK is a new lightweight block cipher that combines the advantages of SPECK and SIMON. In this paper, a large-capacity encryption using SPECK, SIMON, and SIMECK is implemented using a GPU with efficient parallel processing. CUDA library provided by NVIDIA was used, and performance was maximized by using CUDA assembly language PTX to eliminate unnecessary operations. When comparing the results of the simple CPU implementation and the implementation using the GPU, it was possible to perform large-scale encryption at a faster speed. In addition, when comparing the implementation using the C language and the implementation using the PTX when implementing the GPU, it was confirmed that the performance increased further when using the PTX.

Fast and Efficient Implementation of Neural Networks using CUDA and OpenMP (CUDA와 OPenMP를 이용한 빠르고 효율적인 신경망 구현)

  • Park, An-Jin;Jang, Hong-Hoon;Jung, Kee-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.253-260
    • /
    • 2009
  • Many algorithms for computer vision and pattern recognition have recently been implemented on GPU (graphic processing unit) for faster computational times. However, the implementation has two problems. First, the programmer should master the fundamentals of the graphics shading languages that require the prior knowledge on computer graphics. Second, in a job that needs much cooperation between CPU and GPU, which is usual in image processing and pattern recognition contrary to the graphic area, CPU should generate raw feature data for GPU processing as much as possible to effectively utilize GPU performance. This paper proposes more quick and efficient implementation of neural networks on both GPU and multi-core CPU. We use CUDA (compute unified device architecture) that can be easily programmed due to its simple C language-like style instead of GPU to solve the first problem. Moreover, OpenMP (Open Multi-Processing) is used to concurrently process multiple data with single instruction on multi-core CPU, which results in effectively utilizing the memories of GPU. In the experiments, we implemented neural networks-based text extraction system using the proposed architecture, and the computational times showed about 15 times faster than implementation on only GPU without OpenMP.

Implementation of Neural Networks using GPU (GPU를 이용한 신경망 구현)

  • Oh Kyoung-su;Jung Keechul
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.735-742
    • /
    • 2004
  • We present a new use of common graphics hardware to perform a faster artificial neural network. And we examine the use of GPU enhances the time performance of the image processing system using neural network, In the case of parallel computation of multiple input sets, the vector-matrix products become matrix-matrix multiplications. As a result, we can fully utilize the parallelism of GPU. Sigmoid operation and bias term addition are also implemented using pixel shader on GPU. Our preliminary result shows a performance enhancement of about thirty times faster using ATI RADEON 9800 XT board.

High-Speed Implementations of Block Ciphers on Graphics Processing Units Using CUDA Library (GPU용 연산 라이브러리 CUDA를 이용한 블록암호 고속 구현)

  • Yeom, Yong-Jin;Cho, Yong-Kuk
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.3
    • /
    • pp.23-32
    • /
    • 2008
  • The computing power of graphics processing units(GPU) has already surpassed that of CPU and the gap between their powers is getting wider. Thus, research on GPGPU which applies GPU to general purpose becomes popular and shows great success especially in the field of parallel data processing. Since the implementation of cryptographic algorithm using GPU was started by Cook et at. in 2005, improved results using graphic libraries such as OpenGL and DirectX have been published. In this paper, we present skills and results of implementing block ciphers using CUDA library announced by NVIDIA in 2007. Also, we discuss a general method converting source codes of block ciphers on CPU to those on GPU. On NVIDIA 8800GTX GPU, the resulting speeds of block cipher AES, ARIA, and DES are 4.5Gbps, 7.0Gbps, and 2.8Gbps, respectively which are faster than the those on CPU.

Implementation of Viterbi Decoder on Massively Parallel GPU for DVB-T Receiver (DVB-T 수신기를 위한 대규모 병렬처리 GPU 기반의 비터비 복호기 구현)

  • Lee, KyuHyung;Lee, Ho-Kyoung;Heo, Seo Weon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.9
    • /
    • pp.3-11
    • /
    • 2013
  • Recently, a plenty of researches have been conducted using the massively parallel processing of GPU for the implementation of communication system. In this paper, we tried to reduce software simulation time applying GPU with sliding block method to Viterbi decoder in DVB-T system which is one of European DTV standards. First of all, we implement DVB-T system by CPU and estimate cost time whereby the system processes one OFDM symbol. Secondly, we implement Viterbi decoder by software using NVIDIA's massive GPU processor. In our work, stream process method is applied to reduce the overhead for data transfer between CPU and GPU, as well as coalescing method to lower the global memory access time. In addition, data structure design method is used to maximize the shared memory usage. Consequently, our proposed method is approximately 11 times faster in 2K mode and 60 times faster in 8K mode for the process in Viterbi decoder.