• Title/Summary/Keyword: CUDA

Search Result 292, Processing Time 0.033 seconds

Acceleration of Phase Measuring Profilometry using GPU (GPU를 이용한 위상 측정법의 가속화)

  • Kim, Ho-Joong;Cho, Tai-Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.12
    • /
    • pp.2285-2290
    • /
    • 2017
  • Automation systems are evolving in many areas of industry in recent years. At the same time, the necessity of the height inspection of the object by the 3D measurement is gradually increasing. Among the various 3D measurement methods, this paper discusses phase measuring profilometry(PMP). The PMP is a method of obtaining the height of an object using the phase value of the fringe pattern. Since the PMP is an algorithm requiring a large amount of computation, a method for efficiently solving the problem is needed. In this paper, we propose to use CUDA from NVIDIA to solve this problem. We also propose using pinned memory and streams provided by CUDA. This can greatly improve the measurement speed while maintaining accuracy. Finally, we demonstrate the performance of the proposed method through experiments.

Parallel Computation of FDTD algorithm using CUDA (CUDA를 이용한 FDTD 알고리즘의 병렬처리)

  • Lee, Ho-Young;Park, Jong-Hyun;Kim, Jun-Seong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.82-87
    • /
    • 2010
  • Modern GPUs(Graphic Processing Units) provide computing capability higher than that of the general CPUs(Central Processor Units). With supports of programmability of graphics pipeline GP-GPU(General Purpose computation on GPU) has gained much attention expanding its application area. This paper compares sequential and massively parallel implementations of FDTD(Finite Difference Time Domain) algorithm using CUDA(Compute Unified Device Architecture). Experimental results show upto 45X speedup over conventional CPU execution.

Min-Max Octree Generation Using CUDA (CUDA를 이용한 최대-최소 8진트리 생성 기법)

  • Lim, Jong-Hyeon;Shin, Byeong-Seok
    • Journal of Korea Game Society
    • /
    • v.9 no.6
    • /
    • pp.191-196
    • /
    • 2009
  • Volume rendering is a method which extracts meaningful information from volume data and visualizes those information. In general, since the size of volume data gets larger, it is very important to devise acceleration methods for interactive rendering speed. Min-max octree is data structure for high-speed volume rendering, however, its creation time becomes long as the data size increases. In this paper, we propose acceleration method of min-max octree generation using CUDA. Firstly, we convert one-dimensional array from volume data using space filling curve. Then we make min-max octree structures from the sequential array and apply them to acceleration of volume ray casting.

  • PDF

Enhancement of H.264/AVC Encoding Speed and Reduction of CPU Load through Parallel Programming Based on CUDA (CUDA 기반의 병렬 프로그래밍을 통한 H.264/AVC 부호화 속도 향상 및 CPU 부하 경감)

  • Jang, Eun-Been;Ha, Yun-Su
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.34 no.6
    • /
    • pp.858-863
    • /
    • 2010
  • In order to enhance encoding speed in dynamic image encoding using H.264/AVC, reducing the time for motion estimation which takes a large portion of the processing time is very important. An approach using graphics processing unit(GPU) as a coprocessor to assist the central processing unit(CPU) in computing massive data, will be a way to reduce the processing time. In this paper, we present an efficient block-level parallel algorithm for the motion estimation(ME) on a computer unified device architecture(CUDA) platform developed in general-purpose computation on GPU. Experiments are carried out to verify the effectiveness of the proposed algorithm.

Accelerating Group Fusion for Ligand-Based Virtual Screening on Multi-core and Many-core Platforms

  • Mohd-Hilmi, Mohd-Norhadri;Al-Laila, Marwah Haitham;Hassain Malim, Nurul Hashimah Ahamed
    • Journal of Information Processing Systems
    • /
    • v.12 no.4
    • /
    • pp.724-740
    • /
    • 2016
  • The performance issues of screening large database compounds and multiple query compounds in virtual screening highlight a common concern in Chemoinformatics applications. This study investigates these problems by choosing group fusion as a pilot model and presents efficient parallel solutions in parallel platforms, specifically, the multi-core architecture of CPU and many-core architecture of graphical processing unit (GPU). A study of sequential group fusion and a proposed design of parallel CUDA group fusion are presented in this paper. The design involves solving two important stages of group fusion, namely, similarity search and fusion (MAX rule), while addressing embarrassingly parallel and parallel reduction models. The sequential, optimized sequential and parallel OpenMP of group fusion were implemented and evaluated. The outcome of the analysis from these three different design approaches influenced the design of parallel CUDA version in order to optimize and achieve high computation intensity. The proposed parallel CUDA performed better than sequential and parallel OpenMP in terms of both execution time and speedup. The parallel CUDA was 5-10x faster than sequential and parallel OpenMP as both similarity search and fusion MAX stages had been CUDA-optimized.

Pedestrians Action Interpretation based on CUDA for Traffic Signal Control (교통신호제어를 위한 CUDA기반 보행자 행동판단)

  • Lee, Hong-Chang;Rhee, Sang-Yong;Kim, Young-Baek
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.5
    • /
    • pp.631-637
    • /
    • 2010
  • In this paper, We propose a method of motion interpretation of pedestrian for active traffic signal control. We detect pedestrian object in a movie of crosswalk area by using the code book method and acquire contour information. To do this stage fast, we use parallel processing based on CUDA (Compute Unified Device Architecture). And we remove shadow which causes shape distortion of objects. Shadow removed object is judged by using the hilbert scan distance whether to human or noise. If the objects are judged as a human, we analyze pedestrian objects' motion, face area feature, waiting time to decide that they have intetion to across a crosswalk for pdestrians. Traffic signal can be controlled after judgement.

Real-Time Object Segmentation in Image Sequences (연속 영상 기반 실시간 객체 분할)

  • Kang, Eui-Seon;Yoo, Seung-Hun
    • The KIPS Transactions:PartB
    • /
    • v.18B no.4
    • /
    • pp.173-180
    • /
    • 2011
  • This paper shows an approach for real-time object segmentation on GPU (Graphics Processing Unit) using CUDA (Compute Unified Device Architecture). Recently, many applications that is monitoring system, motion analysis, object tracking or etc require real-time processing. It is not suitable for object segmentation to procedure real-time in CPU. NVIDIA provide CUDA platform for Parallel Processing for General Computation to upgrade limit of Hardware Graphic. In this paper, we use adaptive Gaussian Mixture Background Modeling in the step of object extraction and CCL(Connected Component Labeling) for classification. The speed of GPU and CPU is compared and evaluated with implementation in Core2 Quad processor with 2.4GHz.The GPU version achieved a speedup of 3x-4x over the CPU version.

Weather Radar Image Gener ation Method Using Inter polation based on CUDA

  • Yang, Liu;Jang, Bong-Joo;Lim, Sanghun;Kwon, Ki-Chang;Lee, Suk-Hwan;Kwon, Ki-Ryong
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.4
    • /
    • pp.473-482
    • /
    • 2015
  • Doppler weather radar is an important tool for meteorological research. Through several decades of development, Doppler weather radar has enormous progress in understanding, detection and warning of meso and micro scale weather system. It makes a significant contribution to weather forecast and weather disaster warning. But the large amount of data process limits the application of Doppler weather radar. This paper proposed for fast weather radar data processing based on CUDA. CDUA is a powerful platform for highly parallel programming developed by NVIDIA. Through running plenty of threads, radar data can be calculated at same time. In experiment, CUDA parallel program can significantly improve weather data processing time.

OpenMP application to implement CUDA for FDTD algorithm and performance measurement (CUDA로 구현한 FDTD알고리즘의 OpenMP기술 적용 및 성능 측정)

  • Jung, Bok-Jae;Oh, Seung-Take;Lee, Cheol-Hoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2013.01a
    • /
    • pp.3-6
    • /
    • 2013
  • 반도체 공정에서 소자의 제조 비용 감소를 위해 제조 공정 검증을 위한 시뮬레이션을 수행하게 된다. 이 시뮬레이션은 반도체 소자 내부의 물리량 계산을 통해 반도체 소자 내부의 불순물의 거동을 해석하게 된다. 이를 위해 사용되는 알고리즘으로 3차원적 형상을 표현하는 물리적 미분 미분방정식을 계산하게 되는데, 정확한 계산을 위해 유한 차분 시간 영역법(이하 FDTD)과 같은 수치해석 기법을 이용한다. 실제적으로 반도체 공정의 시뮬레이션에서 FDTD연산의 실행 시간은 90% 이상을 소요하게 된다. 이러한 연산에서 더욱 빠른 성능을 확보하기 위해 본 논문에서는 기존의 CUDA(Compute Unified Device Architecture)로 구현된 FDTD알고리즘을 OpenMP를 통한 다중 GPU제어를 이용하여 연산 수행시간을 감소하고, 그 결과물을 통하여 성능 향상도를 측정한다.

  • PDF

CUDA-based Fast DRR Generation for Analysis of Medical Images (의료영상 분석을 위한 CUDA 기반의 고속 DRR 생성 기법)

  • Yang, Sang-Wook;Choi, Young;Koo, Seung-Bum
    • Korean Journal of Computational Design and Engineering
    • /
    • v.16 no.4
    • /
    • pp.285-291
    • /
    • 2011
  • A pose estimation process from medical images is calculating locations and orientations of objects obtained from Computed Tomography (CT) volume data utilizing X-ray images from two directions. In this process, digitally reconstructed radiograph (DRR) images of spatially transformed objects are generated and compared to X-ray images repeatedly until reasonable transformation matrices of the objects are found. The DRR generation and image comparison take majority of the total time for this pose estimation. In this paper, a fast DRR generation technique based on GPU parallel computing is introduced. A volume ray-casting algorithm is explained with brief vector operations and a parallelization technique of the algorithm using Compute Unified Device Architecture (CUDA) is discussed. This paper also presents the implementation results and time measurements comparing to those from pure-CPU implementation and open source toolkit.