• Title/Summary/Keyword: CUDA(CUDA)

Search Result 295, Processing Time 0.031 seconds

Implementation of 3D Object Reconstruction using a Pair of Kinect Cameras (2대의 Kinect 카메라를 이용한 3차원 물체의 복원 구현)

  • Shin, Dong-Won;Ho, Yo-Sung
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2014.06a
    • /
    • pp.135-138
    • /
    • 2014
  • 본 논문에서는 2대의 Kinect 카메라를 이용하여 실세계의 3차원 객체에 대한 복원을 수행하는 방법을 제안한다. 먼저 깊이 가중치가 추가된 계층적 결합형 양방향 필터를 이용하여 Kinect로부터 얻은 원본 깊이 영상을 보정한다. 그리고 카메라 캘리브레이션을 이용하여 카메라의 내부 파라미터와 외부 파라미터를 획득한다. 이를 이용해 3차원 워핑을 수행하여 각 시점의 데이터를 3차원 공간에 점군 모델로 복원하고 표면 모델링 방법을 이용하여 3차원 객체의 매끄러운 표면 모델을 생성한다. 실시간에 가까운 속도를 내기 위해서 계층적 결합형 양방향 필터와 3차원 워핑을 병렬 처리 프레임워크인 CUDA로 구현하여 고속화하였다. 실험을 통해 분리된 각 시점에서의 깊이 정보를 하나의 통합된 3차원 공간에 복원할 수 있었고 초당 5 fps의 속도로 동작하는 것을 확인하였다.

  • PDF

Efficient Implementing of DNA Computing-inspired Pattern Classifier Using GPU (GPU를 이용한 DNA 컴퓨팅 기반 패턴 분류기의 효율적 구현)

  • Choi, Sun-Wook;Lee, Chong-Ho
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.7
    • /
    • pp.1424-1434
    • /
    • 2009
  • DNA computing-inspired pattern classification based on the hypernetwork model is a novel approach to pattern classification problems. The hypernetwork model has been shown to be a powerful tool for multi-class data analysis. However, the ordinary hypernetwork model has limitations, such as operating sequentially only. In this paper, we propose a efficient implementing method of DNA computing-inspired pattern classifier using GPU. We show simulation results of multi-class pattern classification from hand-written digit data, DNA microarray data and 8 category scene data for performance evaluation. and we also compare of operation time of the proposed DNA computing-inspired pattern classifier on each operating environments such as CPU and GPU. Experiment results show competitive diagnosis results over other conventional machine learning algorithms. We could confirm the proposed DNA computing-inspired pattern classifier, designed on GPU using CUDA platform, which is suitable for multi-class data classification. And its operating speed is fast enough to comply point-of-care diagnostic purpose and real-time scene categorization and hand-written digit data classification.

GPU-Accelerated Password Cracking of PDF Files

  • Kim, Keon-Woo;Lee, Sang-Su;Hong, Do-Won;Ryou, Jae-Cheol
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.5 no.11
    • /
    • pp.2235-2253
    • /
    • 2011
  • Digital document file such as Adobe Acrobat or MS-Office is encrypted by its own ciphering algorithm with a user password. When this password is not known to a user or a forensic inspector, it is necessary to recover the password to open the encrypted file. Password cracking by brute-force search is a perfect approach to discover the password but a time consuming process. This paper presents a new method of speeding up password recovery on Graphic Processing Unit (GPU) using a Compute Unified Device Architecture (CUDA). PDF files are chosen as a password cracking target, and the Abode Acrobat password recovery algorithm is examined. Experimental results show that the proposed method gives high performance at low cost, with a cluster of GPU nodes significantly speeding up the password recovery by exploiting a number of computing nodes. Password cracking performance is increased linearly in proportion to the number of computing nodes and GPUs.

A dynamic analysis algorithm for RC frames using parallel GPU strategies

  • Li, Hongyu;Li, Zuohua;Teng, Jun
    • Computers and Concrete
    • /
    • v.18 no.5
    • /
    • pp.1019-1039
    • /
    • 2016
  • In this paper, a parallel algorithm of nonlinear dynamic analysis of three-dimensional (3D) reinforced concrete (RC) frame structures based on the platform of graphics processing unit (GPU) is proposed. Time integration is performed using Newmark method for nonlinear implicit dynamic analysis and parallelization strategies are presented. Correspondingly, a parallel Preconditioned Conjugate Gradients (PCG) solver on GPU is introduced for repeating solution of the equilibrium equations for each time step. The RC frames were simulated using fiber beam model to capture nonlinear behaviors of concrete and reinforcing bars. The parallel finite element program is developed utilizing Compute Unified Device Architecture (CUDA). The accuracy of the GPU-based parallel program including single precision and double precision was verified in comparison with ABAQUS. The numerical results demonstrated that the proposed algorithm can take full advantage of the parallel architecture of the GPU, and achieve the goal of speeding up the computation compared with CPU.

An Optimization for fast digital hologram generation based on GPU (GPU기반의 디지털 홀로그램 고속 생성을 위한 최적화 기법)

  • Song, Joong-Seok;Park, Jong-Il
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.07a
    • /
    • pp.18-21
    • /
    • 2011
  • 디지털 홀로그램은 일반적으로 computer generated hologram(CGH)기법에 의해서 생성된다. 하지만 원리적으로 CGH 기법은 많은 연산량과 복잡도를 요구하고 있기 때문에 실시간으로 디지털 홀로그램을 생성하는 것은 매우 어렵다. 본 논문에서는 CGH 고속연산을 위해 graphics processing unit(GPU)의 병렬처리구조인 CUDA를 사용하였고, 추가적으로 다중 GPU 연산처리를 위해 OpenMP를 사용하였다. 더 나아가 이를 최적화하기 위해서 상수화, 벡터화, 루프풀기 등의 기법들을 제안한다. 결과적으로, 본 논문에서 제안된 기법을 통해서 기존 CPU에서의 CGH 연산속도에 비해 약 8,300배 정도의 속도를 개선할 수 있었다.

  • PDF

Fast Double Random Phase Encoding by Using Graphics Processing Unit (GPU 컴퓨팅에 의한 고속 Double Random Phase Encoding)

  • Saifullah, Saifullah;Moon, In-Kyu
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2012.05a
    • /
    • pp.343-344
    • /
    • 2012
  • With the increase of sensitive data and their secure transmission and storage, the use of encryption techniques has become widespread. The performance of encoding majorly depends on the computational time, so a system with less computational time suits more appropriate as compared to its contrary part. Double Random Phase Encoding (DRPE) is an algorithm with many sub functions which consumes more time when executed serially; the computation time can be significantly reduced by implementing important functions in a parallel fashion on Graphics Processing Unit (GPU). Computing convolution using Fast Fourier transform in DRPE is the most important part of the algorithm and it is shown in the paper that by performing this portion in GPU reduced the execution time of the process by substantial amount and can be compared with MATALB for performance analysis. NVIDIA graphic card GeForce 310 is used with CUDA C as a programming language.

  • PDF

Design of Scratch Detection Algorithm based on GPU (GPU 기반 스크래치 탐지 알고리즘의 설계)

  • Lee, Joon-Goo;Han, Ki-Sun;You, Byoung-Moon;Hwang, Doo-Sung
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2013.07a
    • /
    • pp.9-10
    • /
    • 2013
  • 영상의 스크래치 탐지는 프레임 간 화소 데이터의 비교에 있어서 많은 처리 시간을 필요로 한다. 본 논문은 스크래치 탐지 알고리즘이 GPU에서 수행할 수 있도록 병렬 설계를 제안하고, 국가 기록원 소장 디지털화 영상에 대해 실험하였다. 실험에서 제안하는 방법은 순차적 스크래치 탐지 방법과 비교하여 약 5배의 처리 시간을 단축했으며, 탐지율은 각 방법 모두 60% 정도로 유사함을 보였다.

  • PDF

The Study of LDPC for Railroad Signal control system by Using GPU (GPU를 이용한 철도신호에서의 LDPC 적용에 관한 연구)

  • Park, Joo-Yul;Kim, Hyo-Sang;Kim, Jae-Moon;KIm, Bong-Taek;Chung, Ki-Seok
    • Proceedings of the KSR Conference
    • /
    • 2010.06a
    • /
    • pp.1075-1080
    • /
    • 2010
  • There have been lots of researches for High Performance Digital Signal Processing performance enhancement on a GPU(Graphic Processor Unit). These kinds of parallelizing can enable massive signal processing, so we can have advantage's of processing various of signal processing standards with GPU. In this paper we introduce Low Density Parity Check(LDPC) which is one of the Foward Error Correction(FEC). And we have achieved computational time reduce by using CUDA as a parallelizing scheme.

  • PDF

Implementation of Integrated CPU-GPU for Efficient Uniform Memory Access Method and Verification System (CPU-GPU간 긴밀성을 위한 효율적인 공유메모리 접근 방법과 검증 시스템 구현)

  • Park, Hyun-moon;Kwon, Jinsan;Hwang, Tae-ho;Kim, Dong-Sun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.2
    • /
    • pp.57-65
    • /
    • 2016
  • In this paper, we propose a system for efficient use of shared memory between CPU and GPU. The system, called Fusion Architecture, assures consistency of the shared memory and minimizes cache misses that frequently occurs on Heterogeneous System Architecture or Unified Virtual Memory based systems. It also maximizes the performance for memory intensive jobs by efficient allocation of GPU cores. To test between architectures on various scenarios, we introduce the Fusion Architecture Analyzer, which compares OpenMP, OpenCL, CUDA, and the proposed architecture in terms of memory overhead and process time. As a result, Proposed fusion architectures show that the Fusion Architecture runs benchmarks 55% faster and reduces memory overheads by 220% in average.

Implementation of $2{\times}2$ MIMO LTE Base Station using GPU for SDR System (GPU를 이용한 SDR 시스템 용 LTE MIMO 기지국 기능 구현)

  • Lee, Seung Hak;Kim, Kyung Hoon;Ahn, Chi Young;Choi, Seung Won
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.8 no.4
    • /
    • pp.91-98
    • /
    • 2012
  • This paper implements 2X2 MIMO Long Term Evolution (LTE) base station using Software defined radio (SDR) technology. The implemented base station system processes baseband signals on a Graphics Processor Unit(GPU). GPU is a high-speed parallel processor which provides very important advantage of using a very powerful C-based programming environment that is Compute Unified Device Architecture (CUDA). The implemented software-based base station system processes baseband signals through GPU. It utilizes USRP2 as its RF transceiver. In order to guarantee a real-time processing of LTE baseband signals, we have adopted well-known signal processing algorithms such as frame synchronization algorithms, ML detection, etc. using GPU operating in parallel processing.