• Title/Summary/Keyword: GPU process

Search Result 147, Processing Time 0.024 seconds

Parallel Design and Implementation of Shot Boundary Detection Algorithm (샷 경계 탐지 알고리즘의 병렬 설계와 구현)

  • Lee, Joon-Goo;Kim, SeungHyun;You, Byoung-Moon;Hwang, DooSung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.2
    • /
    • pp.76-84
    • /
    • 2014
  • As the number of high-density videos increase, parallel processing approaches are necessary to process a large-scale of video data. When a processing method of video data requires thousands of simple operations, GPU-based parallel processing is preferred to CPU-based parallel processing by way of reducing the time and space complexities of a given computation problem. This paper studies the parallel design and implementation of a shot-boundary detection algorithm. The proposed shot-boundary detection algorithm uses pixel brightness comparisons and global histogram data among the blocks of frames, and the computation of these data is characterized with the high parallelism for the related operations. In order to maximize these operations in parallel, the computations of the pixel brightness and histogram are designed in parallel and implemented in NVIDIA GPU. The GPU-based shot detection method is tested with 10 videos from the set of videos in National Archive of Korea. In experiments, the detection rate is similar but the computation time is about 10 time faster to that of the CPU-based algorithm.

Benchmark Results of a Radio Spectrometer Based on Graphics Processing Unit

  • Kim, Jongsoo;Wagner, Jan
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.40 no.2
    • /
    • pp.44.1-44.1
    • /
    • 2015
  • We set up a project to make spectrometers for single dish observations of the Korean VLBI Network (KVN), a new future multi-beam receiver of the ASTE (Atacama Submillimeter Telescope Experiment), and the total power (TP) antennas of the Atacama Large Millimeter/submillimeter Array (ALMA). Traditionally, spectrometers based on ASIC (Application-Specific Integrated circuit) and FPGA (Field-Programmable Gate Array) have been used in radio astronomy. It is, however, that a Graphics Processing Unit (GPU) technology is now viable for spectrometers due to the rapid improvement of its performance. A high-resolution spectrometer should have the following functions: poly-phase filter, data-bit conversion, fast Fourier transform, and complex multiplication. We wrote a program based on CUDA (Compute Unified Device Architecture) for a GPU spectrometer. We measured its performance using two GPU cards, Titan X and K40m, from NVIDIA. A non-optimized GPU code can process a data stream of around 2 GHz bandwidth, which is enough for the KVN spectrometer and promising for the ASTE and ALMA TP spectrometers.

  • PDF

GPU-Accelerated Password Cracking of PDF Files

  • Kim, Keon-Woo;Lee, Sang-Su;Hong, Do-Won;Ryou, Jae-Cheol
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.5 no.11
    • /
    • pp.2235-2253
    • /
    • 2011
  • Digital document file such as Adobe Acrobat or MS-Office is encrypted by its own ciphering algorithm with a user password. When this password is not known to a user or a forensic inspector, it is necessary to recover the password to open the encrypted file. Password cracking by brute-force search is a perfect approach to discover the password but a time consuming process. This paper presents a new method of speeding up password recovery on Graphic Processing Unit (GPU) using a Compute Unified Device Architecture (CUDA). PDF files are chosen as a password cracking target, and the Abode Acrobat password recovery algorithm is examined. Experimental results show that the proposed method gives high performance at low cost, with a cluster of GPU nodes significantly speeding up the password recovery by exploiting a number of computing nodes. Password cracking performance is increased linearly in proportion to the number of computing nodes and GPUs.

Implementation of Massive FDTD Simulation Computing Model Based on MPI Cluster for Semi-conductor Process (반도체 검증을 위한 MPI 기반 클러스터에서의 대용량 FDTD 시뮬레이션 연산환경 구축)

  • Lee, Seung-Il;Kim, Yeon-Il;Lee, Sang-Gil;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.9
    • /
    • pp.21-28
    • /
    • 2015
  • In the semi-conductor process, a simulation process is performed to detect defects by analyzing the behavior of the impurity through the physical quantity calculation of the inner element. In order to perform the simulation, Finite-Difference Time-Domain(FDTD) algorithm is used. The improvement of semiconductor which is composed of nanoscale elements, the size of simulation is getting bigger. Problems that a processor such as CPU or GPU cannot perform the simulation due to the massive size of matrix or a computer consist of multiple processors cannot handle a massive FDTD may come up. For those problems, studies are performed with parallel/distributed computing. However, in the past, only single type of processor was used. In GPU's case, it performs fast, but at the same time, it has limited memory. On the other hand, in CPU, it performs slower than that of GPU. To solve the problem, we implemented a computing model that can handle any FDTD simulation regardless of size on the cluster which consist of heterogeneous processors. We tested the simulation on processors using MPI libraries which is based on 'point to point' communication and verified that it operates correctly regardless of the number of node and type. Also, we analyzed the performance by measuring the total execution time and specific time for the simulation on each test.

Real-time Volume Rendering using Point-Primitive (포인트 프리미티브를 이용한 실시간 볼륨 렌더링 기법)

  • Kang, Dong-Soo;Shin, Byeong-Seok
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.10
    • /
    • pp.1229-1237
    • /
    • 2011
  • The volume ray-casting method is one of the direct volume rendering methods that produces high-quality images as well as manipulates semi-transparent object. Although the volume ray-casting method produces high-quality image by sampling in the region of interest, its rendering speed is slow since the color acquisition process is complicated for repetitive memory reference and accumulation of sample values. Recently, the GPU-based acceleration techniques are introduced. However, they require pre-processing or additional memory. In this paper, we propose efficient point-primitive based method to overcome complicated computation of GPU ray-casting. It presents semi-transparent objects, however it does not require preprocessing and additional memory. Our method is fast since it generates point-primitives from volume dataset during sampling process and it projects the primitives onto the image plane. Also, our method can easily cope with OTF change because we can add or delete point-primitive in real-time.

Accelerated Implementation of NTRU on GPU for Efficient Key Exchange in Multi-Client Environment (다중 사용자 환경에서 효과적인 키 교환을 위한 GPU 기반의 NTRU 고속구현)

  • Seong, Hyoeun;Kim, Yewon;Yeom, Yongjin;Kang, Ju-Sung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.481-496
    • /
    • 2021
  • It is imperative to migrate the current public key cryptosystem to a quantum-resistance system ahead of the realization of large-scale quantum computing technology. The National Institute of Standards and Technology, NIST, is promoting a public standardization project for Post-Quantum Cryptography(PQC) and also many research efforts have been conducted to apply PQC to TLS(Transport Layer Security) protocols, which are used for Internet communication security. In this paper, we propose a scenario in which a server and multi-clients share session keys on TLS by using the parallelized NTRU which is PQC in the key exchange process. In addition, we propose a method of accelerating NTRU using GPU and analyze its efficiency in an environment where a server needs to process large-scale data simultaneously.

Dynamic Remeshing for Real-Time Representation of Thin-Shell Tearing Simulations on the GPU

  • Jong-Hyun Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.89-96
    • /
    • 2023
  • In this paper, we propose a GPU-based method for real-time processing of dynamic re-meshing required for tearing cloth. Thin shell materials are used in various fields such as physics-based simulation/animation, games, and virtual reality. Tearing the fabric requires dynamically updating the geometry and connectivity, making the process complex and computationally intensive. This process needs to be fast, especially when dealing with interactive content. Most methods perform re-meshing through low-resolution simulations to maintain real-time, or rely on an already segmented pattern, which is not considered dynamic re-meshing, and the quality of the torn pattern is low. In this paper, we propose a new GPU-optimized dynamic re-meshing algorithm that enables real-time processing of high-resolution fabric tears. The method proposed in this paper can be used for virtual surgical simulation and physics-based modeling in games and virtual environments that require real-time, as it allows dynamic re-meshing rather than pre-split meshes.

Real-time Stereo Video Generation using Graphics Processing Unit (GPU를 이용한 실시간 양안식 영상 생성 방법)

  • Shin, In-Yong;Ho, Yo-Sung
    • Journal of Broadcast Engineering
    • /
    • v.16 no.4
    • /
    • pp.596-601
    • /
    • 2011
  • In this paper, we propose a fast depth-image-based rendering method to generate a virtual view image in real-time using a graphic processor unit (GPU) for a 3D broadcasting system. Before the transmission, we encode the input 2D+depth video using the H.264 coding standard. At the receiver, we decode the received bitstream and generate a stereo video using a GPU which can compute in parallel. In this paper, we apply a simple and efficient hole filling method to reduce the decoder complexity and reduce hole filling errors. Besides, we design a vertical parallel structure for a forward mapping process to take advantage of the single instruction multiple thread structure of GPU. We also utilize high speed GPU memories to boost the computation speed. As a result, we can generate virtual view images 15 times faster than the case of CPU-based processing.

Massive Terrain Rendering Method Using RGBA Channel Indexing of Wavelet Coefficients (웨이블릿 압축 계수의 RGBA채널 인덱싱을 이용한 대용량 지형 렌더링 기법)

  • Kim, Tae-Gwon;Lee, Eun-Seok;Shin, Byeong-Seok
    • Journal of Korea Game Society
    • /
    • v.13 no.5
    • /
    • pp.55-62
    • /
    • 2013
  • Since large terrain data can not be loaded on the GPU or CPU memory at once, out-of-core methods which read necessary part from the secondary storage such as a hard disk are commonly used. However, long delay may occur due to limited bandwidth while loading the data from the hard disk to memory. We propose efficient rendering method of large terrain data, which compresses the data with wavelet technique and save its coefficients in RGBA channel of an image us, then decompresses that in rendering stage. Entire process is performed in GPU using Direct Compute. By reducing the amount of data transfer, performing wavelet computations in parallel and doing decompression quickly on the GPU, our method can reduce rendering time effectively.

VOCs Permeation Property of Composite Hollow Fiber Membranes (중공사 복합막을 이용한 다성분계 휘발성 유기 화합물 투과 특성)

  • Choi, Whee Moon;Cho, Soon Haing;Kim, Soon Tae;Lee, Chung Seop;Nam, Sang Yong
    • Membrane Journal
    • /
    • v.23 no.2
    • /
    • pp.176-184
    • /
    • 2013
  • To investigate the performance of VOC separation, composite hollow fiber membrane was prepared which composed of poly (ether imide) support prepared by phase separation method and poly (dimethylsiloxane) coating active layer. The performances of the membranes for the application of recovery process in terms of their morphology, gas permeance test for $N_2$ and $O_2$ gases. Durability against benzene, toluene and xylene was also investigated. And permeation test for multi-component VOCS through the membrane with different feed concentration and stage-cut were investigated. Permeance of PEI supported membrane and the membranes coated with PDMS decreased from 45,000 GPU to 63 GPU and 49,450 to 30 GPU for $N_2$ and $O_2$, respectively. Recovery efficiency and concentration of VOCs in permeate increased with decreasing stage-cut. VOCs concentration in permeate proportionally increased with increasing feed concentration but concentration ratio and recovery efficiency showed any noticeable changes with feed concentration change.