• Title/Summary/Keyword: Graphics processing unit(GPU)

Search Result 154, Processing Time 0.026 seconds

A Study on Efficiency of Cryptography Using GPU (GPU를 이용한 암호화 효율성 연구)

  • Byeon, Jin-Yeong;Lee, Ki-Young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.10a
    • /
    • pp.683-686
    • /
    • 2011
  • 1970년대 라디오 주파수를 사용하여 컴퓨터 통신 네트워크가 구축된 이후 눈부신 발전을 거듭하여 Personal Computer 뿐만 아니라 Mobile이나 Tablet PC등에서도 인터넷이 가능하다. 이렇게 다양한 매체를 통해 인터넷을 사용함에 따라 보안에 대한 중요성이 높아지고 있다. 하지만 최근 현대 캐피탈이나 농협, 네이트와 같은 해킹 사례를 보면 평문 데이터 사용에 의해 피해가 더욱 확대 되었다. 평문 데이터 사용함에 따라 보안 위협이 커지는데 평문 데이터를 사용하는 이유를 암호화를 사용했을 때보다 QoS 하락 때문이라고 볼 수있다. 이를 해결하기 위해 고정된 인프라에서 잉여 자원인 GPU를 사용하여 암호화를 할 때 QoS 하락을 줄일 수 있을 것이다. 또한 CPU보다는 멀티코어를 사용한 병렬 처리를 활용하여 CPU보다 상대적으로 효율적인 암호화가 가능하다고 생각한다. 본 논문에서는 CPU를 이용한 암호화 처리 속도와 GPU를 이용한 암호화 처리 속도를 비교하여 GPU를 이용한 암호화 처리 가능성을 검토하였다.

  • PDF

Analyzing delay of Kernel function owing to GPU memory input from multiple VMs in RPC-based GPU virtualization environments (RPC 기반 GPU 가상화 환경에서 다중 가상머신의 GPU 메모리 입력으로 인한 커널 함수의 지연 문제 분석)

  • Kang, Jihun;Kim, Soo Kyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.541-542
    • /
    • 2021
  • 클라우드 컴퓨팅 환경에서는 고성능 컴퓨팅을 지원하기 위해 사용자에게 GPU(Graphic Processing Unit)가 할당된 가상머신을 제공하여 사용자가 고성능 응용을 실행할 수 있도록 지원한다. 일반적인 컴퓨팅 환경에서 한 명의 사용자가 GPU를 독점해서 사용하기 때문에 자원 경쟁으로 인한 문제가 상대적으로 적게 발생하지만 독립적인 여러 사용자가 컴퓨팅 자원을 공유하는 클라우드 환경에서는 자원 경쟁으로 인해 서로 성능 영향을 미치는 문제를 발생시킨다. 본 논문에서는 여러 개의 가상머신이 단일 GPU를 공유하는 RPC(Remote Procedure Call) 기반 GPU 가상화 환경에서 다수의 가상머신이 GPGPU(General Purpose computing on Graphics Processing Units) 작업을 수행할 때 GPU 메모리 입력 경쟁으로 인해 발생하는 커널 함수의 실행 지연 문제를 분석한다.

  • PDF

An Efficient Block Cipher Implementation on Many-Core Graphics Processing Units

  • Lee, Sang-Pil;Kim, Deok-Ho;Yi, Jae-Young;Ro, Won-Woo
    • Journal of Information Processing Systems
    • /
    • v.8 no.1
    • /
    • pp.159-174
    • /
    • 2012
  • This paper presents a study on a high-performance design for a block cipher algorithm implemented on modern many-core graphics processing units (GPUs). The recent emergence of VLSI technology makes it feasible to fabricate multiple processing cores on a single chip and enables general-purpose computation on a GPU (GPGPU). The GPU strategy offers significant performance improvements for all-purpose computation and can be used to support a broad variety of applications, including cryptography. We have proposed an efficient implementation of the encryption/decryption operations of a block cipher algorithm, SEED, on off-the-shelf NVIDIA many-core graphics processors. In a thorough experiment, we achieved high performance that is capable of supporting a high network speed of up to 9.5 Gbps on an NVIDIA GTX285 system (which has 240 processing cores). Our implementation provides up to 4.75 times higher performance in terms of encoding and decoding throughput as compared to the Intel 8-core system.

High Speed SD-OCT System Using GPU Accelerated Mode for in vivo Human Eye Imaging

  • Cho, Nam Hyun;Jung, Unsang;Kim, Suhwan;Jung, Woonggyu;Oh, Junghwan;Kang, Hyun Wook;Kim, Jeehyun
    • Journal of the Optical Society of Korea
    • /
    • v.17 no.1
    • /
    • pp.68-72
    • /
    • 2013
  • We developed an SD-OCT (Spectral Domain-Optical Coherence Tomography) system which uses a GPU (Graphics Processing Unit) for processing. The image size from the SD-OCT system is $1024{\times}512$ and the speed is 110 frame/sec in real-time. K-domain linearization, FFT (Fast Fourier Transform), and log scaling were included in the GPU processing. The signal processing speed was about 62 ms using a CPU (Central Processing Unit) and 1.6 ms using a GPU, which is 39 times faster. We performed an in-vivo retinal scan, and reconstructed a 3D visualization based on C-scan images. As a result, there were minimal motion artifacts and we confirmed that tomograms of blood vessels, the optic nerve, and the optic disk are clearly identified. According to the results of this study, this SD-OCT can be applied to real-time 3D display technology, particularly auxiliary instruments for eye operations in ophthalmology.

Analyzing the performance of training tasks based on GPU memory use manner of TensorFlow in Container environments (컨테이너 환경에서 텐서플로의 GPU 메모리 사용방식에 따른 학습 작업의 성능 분석)

  • Jihun Kang;Joon-Min Gil
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.60-62
    • /
    • 2023
  • 인공지능의 학습 작업은 연산량이 많아 고성능 연산 장치인 GPU(Graphics Processing Unit)를 필요로 하며, GPU 장치의 성능은 학습 작업의 실행 성능에 직접적으로 영향을 미치는 요소 중 하나로 작용한다. 인공지능 작업을 처리하기 위해 많이 사용되는 텐서플로의 경우 GPU를 사용해 연산을 수행할 때 기본적으로 거의 모든 GPU 메모리 영역을 단일 학습 작업이 점유하도록 GPU 메모리를 관리한다. 이 방법은 컴퓨팅 자원 중 확장성이 가장 낮은 GPU 메모리의 단편화를 방지하기 위해 사용되는 방법이지만, 하나의 학습 작업이 GPU를 점유하게 되면, 실제 GPU 메모리 사용량과 상관없이 다른 프로세스는 GPU를 사용할 수 없는 문제를 유발한다. 특히, 전이학습, 소규모 학습과 같이 상대적으로 작업 규모가 작은 경우에는 전체 GPU 메모리 용량 중 대부분의 영역이 낭비된다. 본 논문에서는 컨테이너 환경에서 텐서플로의 기본 GPU 메모리 사용 방식으로 인해 다수의 학습 작업을 동시 실행하는 것이 불가능한 문제를 확인하고 GPU 메모리 사용량을 제한한 경우와 하지 않은 경우에 실제 GPU 메모리 사용량과 학습 작업의 실행 시간에 대한 성능 비교를 통해 GPU 메모리의 단편화 방지가 성능에 유의미한 요소인지 검증한다.

Synthesis of Ocean Wave Models and Simulation Using GPU (바다물결 모형의 합성 및 GPU를 이용한 시뮬레이션)

  • Lee, Dong-Min;Lee, Sung-Kee
    • The KIPS Transactions:PartA
    • /
    • v.14A no.7
    • /
    • pp.421-434
    • /
    • 2007
  • Among many other CG generated natural scenes, the representation of ocean surfaces is one of the most complicated and time-consuming problem because of its large extent and complex surface movement. We present a hybrid method to represent and animate unbound deep-water ocean surfaces by utilizing graphics processor as both simulation and rendering core. Our technique is mainly based on spectral approaches that generate a high-detailed height field using Fourier transform on a 2D regular grid. Additionally, we incorporate Gerstner model and generate low-detailed height field on a 2D projected grid in order to represent large waves and main structure of ocean surface. There is no interruption between CPU and GPU, and no need to transfer simulation results from the system memory to graphics hardware because the entire simulation and rending processes are done on graphics processor. As a result we can synthesize and render realistic water surfaces in real-time. Proposed techniques are readily adoptable to real-time applications such as computer games that have heavy work load on CPU but still demand plausible natural scenes.

Image based Relighting Using HDRI Enviroment Map & Progressive refinement radiosity on GPU (HDRI 환경맵과 GPU 기반 점진적 세분 래디오시티를 이용한 영상기반 재조명)

  • Kim, Jun-Hwan;Hong, Hyun-Ki
    • Journal of Korea Game Society
    • /
    • v.7 no.4
    • /
    • pp.53-62
    • /
    • 2007
  • Although radiosity can represent diffuse reflections of the object surfaces by modeling energy exchange in 3D space, there are some restrictions for real-time applications because of its computation loads. Therefore, GPU(Graphics Processing Unit) based radiosity algorithms have been presented actively to improve its rendering performance. We implement the progressive refinement radiosity on GPU by G. Coombe in 3D scene that is constructed with HDR(High Dynamic Range) radiance map. This radiosity method can generate a photo-realistic rendering image in 3D space, where the synthetic objects were illuminated by the environmental light sources. In the simulation results, the rendering performance is analyzed according to the resolution of the texel in the environmental map and mipmaping. In addition, we compare the rendering results by our method with those by the incremental radiosity.

  • PDF

Assessment of Parallel Computing Performance of Agisoft Metashape for Orthomosaic Generation (정사모자이크 제작을 위한 Agisoft Metashape의 병렬처리 성능 평가)

  • Han, Soohee;Hong, Chang-Ki
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.427-434
    • /
    • 2019
  • In the present study, we assessed the parallel computing performance of Agisoft Metashape for orthomosaic generation, which can implement aerial triangulation, generate a three-dimensional point cloud, and make an orthomosaic based on SfM (Structure from Motion) technology. Due to the nature of SfM, most of the time is spent on Align photos, which runs as a relative orientation, and Build dense cloud, which generates a three-dimensional point cloud. Metashape can parallelize the two processes by using multi-cores of CPU (Central Processing Unit) and GPU (Graphics Processing Unit). An orthomosaic was created from large UAV (Unmanned Aerial Vehicle) images by six conditions combined by three parallel methods (CPU only, GPU only, and CPU + GPU) and two operating systems (Windows and Linux). To assess the consistency of the results of the conditions, RMSE (Root Mean Square Error) of aerial triangulation was measured using ground control points which were automatically detected on the images without human intervention. The results of orthomosaic generation from 521 UAV images of 42.2 million pixels showed that the combination of CPU and GPU showed the best performance using the present system, and Linux showed better performance than Windows in all conditions. However, the RMSE values of aerial triangulation revealed a slight difference within an error range among the combinations. Therefore, Metashape seems to leave things to be desired so that the consistency is obtained regardless of parallel methods and operating systems.

Efficient GPU Isosurface Ray-casting of BCC Datasets (효율적인 BCC 볼륨 데이터의 GPU 등가면 광선투사법)

  • Kim, Minho;Kim, Hyunjun;Sarfaraz, Aaliya
    • Journal of the Korea Computer Graphics Society
    • /
    • v.19 no.2
    • /
    • pp.19-27
    • /
    • 2013
  • This paper presents a real-time GPU (Graphics Processing Unit) isosurface ray-caster that improves the performance by 4-7 folds from our previous method, while keeping the superior visual quality. Such an improvement is achieved by incorporating an efficient empty-space skipping scheme and an analytic normal computation. The empty-space skipping scheme is done by building an min/max octree computed from the BB(Bernslein-B$\acute{e}$zier)-form of spline pieces and the analytic normal Formula provides not only a nice visual quality but also an improved evaluation performance.

A Realization of CNN-based FPGA Chip for AI (Artificial Intelligence) Applications (합성곱 신경망 기반의 인공지능 FPGA 칩 구현)

  • Young Yun
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.11a
    • /
    • pp.388-389
    • /
    • 2022
  • Recently, AI (Artificial Intelligence) has been applied to various technologies such as automatic driving, robot and smart communication. Currently, AI system is developed by software-based method using tensor flow, and GPU (Graphic Processing Unit) is employed for processing unit. However, if software-based method employing GPU is used for AI applications, there is a problem that we can not change the internal circuit of processing unit. In this method, if high-level jobs are required for AI system, we need high-performance GPU, therefore, we have to change GPU or graphic card to perform the jobs. In this work, we developed a CNN-based FPGA (Field Programmable Gate Array) chip to solve this problem.

  • PDF