• Title/Summary/Keyword: CUDA(CUDA)

Search Result 295, Processing Time 0.03 seconds

Improved Tracking System and Realistic Drawing for Real-Time Water-Based Sign Pen (향상된 트래킹 시스템과 실시간 수성 사인펜을 위한 사실적 드로잉)

  • Hur, Hyejung;Lee, Ju-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.2
    • /
    • pp.125-132
    • /
    • 2014
  • In this paper, we present marker-less fingertip and brush tracking system with inexpensive web camera. Parallel computation using CUDA is applied to the tracking system. This tracking system can run on inexpensive environment such as a laptop or a desktop and support for real-time application. We also present realistic water-based sign pen drawing model and implementation. The realistic drawing application with our inexpensive real-time fingertip and brush tracking system shows us the art class of the future. The realistic drawing application, along with our inexpensive real-time fingertip and brush tracking system, would be utilized in test-bed for the future high-technology education environment.

An Improved Hybrid Approach to Parallel Connected Component Labeling using CUDA

  • Soh, Young-Sung;Ashraf, Hadi;Kim, In-Taek
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.16 no.1
    • /
    • pp.1-8
    • /
    • 2015
  • In many image processing tasks, connected component labeling (CCL) is performed to extract regions of interest. CCL was usually done in a sequential fashion when image resolution was relatively low and there are small number of input channels. As image resolution gets higher up to HD or Full HD and as the number of input channels increases, sequential CCL is too time-consuming to be used in real time applications. To cope with this situation, parallel CCL framework was introduced where multiple cores are utilized simultaneously. Several parallel CCL methods have been proposed in the literature. Among them are NSZ label equivalence (NSZ-LE) method[1], modified 8 directional label selection (M8DLS) method[2], and HYBRID1 method[3]. Soh [3] showed that HYBRID1 outperforms NSZ-LE and M8DLS, and argued that HYBRID1 is by far the best. In this paper we propose an improved hybrid parallel CCL algorithm termed as HYBRID2 that hybridizes M8DLS with label backtracking (LB) and show that it runs around 20% faster than HYBRID1 for various kinds of images.

Voronoi Diagram Computation for a Molecule Using Graphics Hardware (그래픽 하드웨어를 이용한 분자용 보로노이 다이어그램 계산)

  • Lee, Jung-Eun;Baek, Nak-Hoon;Kim, Ku-Jin
    • The KIPS Transactions:PartA
    • /
    • v.19A no.4
    • /
    • pp.169-174
    • /
    • 2012
  • We present an algorithm that computes a 3 dimensional Voronoi diagram for a protein molecule in this paper. The molecule is represented as a set of spheres with van der Waals radii. The Voronoi diagram is constructed in the 3D space by finding the voxels containing it. For the feasibility of the computation, we represent the molecule as a BVH (bounding volume hierarchy), and our system is accelerated by modern graphics hardware with CUDA programming support. Compared to single-core CPU implementations, experimental results show 323 times faster performance in the computation time, when the space is partitioned into $2^{24}$ voxels.

A 2D GPU-Accelerated High Resolution Numerical Scheme for Solving Diffusive Wave Equation (고해상도 수치기법을 이용한 GPU 기반 2D 확산파 모형)

  • Park, Seonryang;Kim, Dae-Hong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.109-109
    • /
    • 2019
  • 본 연구에서는 강우-유출 과정 모의를 위한 GPU 기반 확산파 모형을 개발하였다. 확산파 방정식을 풀기위한 수치기법으로는 유한체적법을 이용하였으며, van Leer TVD limiter를 적용한 MUSCL 기법을 이용하여 각 셀의 인터페이스의 물리적 성질을 재구성하여 구하였다. 또한, 침투를 고려하기 위하여 Horton 침투 모형을 이용하였다. 개발된 모형을 이용하여 1D single overland plane과 2D V-shaped overland에서 강우-유출 과정을 모의실험을 하였으며, 각각 해석해와 dynamic wave model을 이용하여 계산된 수치 결과와 비교하여 본 모형의 정확성을 검증하였다. 또한, 1D와 2D의 기복이 심한 지형에 적용하여 강우-유출과정이 본 모형을 통하여 물리적으로 타당한 해석이 가능함을 검증하였다. 마지막으로 복잡한 실제 지형에 적용하였으며, 측정값과의 비교를 통하여 실제 유역에서의 확산파 모형의 적정성을 검증하였다. 또한, 본 연구에서는 NVIDIA사의 GPU인 Geforce GTX 1050과 GPU의 병렬 연산 처리 능력을 활용할 수 있는 NVIDIA사의 CUDA-Fortran을 이용하여 GPU 기반 확산파 모형을 개발하였다. PC windows에서 CPU(Intel i7, 4.70 GHz) 기반 모형 대비 GPU 기반 모형의 계산속도 성능을 비교한 결과, 격자 간격이 증가할수록 CPU 기반 모형 대비 GPU 기반 모형의 연산 효율이 증가하였으며, 격자 간격이 $3200{\times}3200$일 때, CPU 기반 모형 대비 GPU 기반 모형의 연산 효율이 최대 약 150배 증가하였다.

  • PDF

A study on application of GPU-accelerated kinematic wave rainfall-runoff model (GPU 가속 운동파 강우유출모형의 적용 연구)

  • Kim, Boram;Yun, Gwan Seon;Kim, Hyeong-Jun;Yoon, Kwang Seok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.323-323
    • /
    • 2020
  • 그래픽 처리 장치(Graphic Processing Unit: GPU)는 그래픽 처리 작업에 특화된 다수의 산술논리 장치(Arithmetic Logic Unit: ALU)로 구성되어 있어서 중앙 처리 장치(Central Processing Unit: CPU)보다 한 번에 더 많은 연산 수행이 가능하다. 본 연구는 GPU 가속 운동파모형을 실제 유역에 적용하여, GPU 가속 운동파 강우유출모형 결과에 대한 정확성과 연산 소요 시간에 대한 효율성을 확인하였다. GPU 가속 운동파모형은 분포형 강우유출모형의 수치모의 연산시간을 단축시키기 위해 CUDA 포트란을 이용하여 개발되었다. 분포형모형의 지배방정식은 운동파모형과 Green-Ampt모형으로 구성되었고, 운동파모형은 유한체적법을 이용하여 이산화 하였다. GPU 가속 운동파모형을 이용하여 금강의 미호천 유역에서 발생하는 강우유출현상을 모의 하였고, 동일한 유한체적법을 이용한 CPU(Central Processing Unit) 기반의 강우유출모형과 비교하였다. 그 결과 GPU 가속모형의 결과는 미호천 유역 하류단에서 관측한 결과와 유사한 결과를 나타냈다. 또한, 연산소요시간은 CPU 기반의 강우유출모형의 연산소요시간보다 단축되었으며, 본 연구에 사용된 장비를 기준으로 최대 100배 정도 단축되었다.

  • PDF

Design of Omok AI using Genetic Algorithm and Game Trees and Their Parallel Processing on the GPU (유전 알고리즘과 게임 트리를 병합한 오목 인공지능 설계 및 GPU 기반 병렬 처리 기법)

  • Ahn, Il-Jun;Park, In-Kyu
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.2
    • /
    • pp.66-75
    • /
    • 2010
  • This paper proposes an efficient method for design and implementation of the artificial intelligence (AI) of 'omok' game on the GPU. The proposed AI is designed on a cooperative structure using min-max game tree and genetic algorithm. Since the evaluation function needs intensive computation but is independently performed on a lot of candidates in the solution space, it is computed on the GPU in a massive parallel way. The implementation on NVIDIA CUDA and the experimental results show that it outperforms significantly over the CPU, in which parallel game tree and genetic algorithm on the GPU runs more than 400 times and 300 times faster than on the CPU. In the proposed cooperative AI, selective search using genetic algorithm is performed subsequently after the full search using game tree to search the solution space more efficiently as well as to avoid the thread overflow. Experimental results show that the proposed algorithm enhances the AI significantly and makes it run within the time limit given by the game's rule.

A Study on GPU Computing of Bi-conjugate Gradient Method for Finite Element Analysis of the Incompressible Navier-Stokes Equations (유한요소 비압축성 유동장 해석을 위한 이중공액구배법의 GPU 기반 연산에 대한 연구)

  • Yoon, Jong Seon;Jeon, Byoung Jin;Jung, Hye Dong;Choi, Hyoung Gwon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.40 no.9
    • /
    • pp.597-604
    • /
    • 2016
  • A parallel algorithm of bi-conjugate gradient method was developed based on CUDA for parallel computation of the incompressible Navier-Stokes equations. The governing equations were discretized using splitting P2P1 finite element method. Asymmetric stenotic flow problem was solved to validate the proposed algorithm, and then the parallel performance of the GPU was examined by measuring the elapsed times. Further, the GPU performance for sparse matrix-vector multiplication was also investigated with a matrix of fluid-structure interaction problem. A kernel was generated to simultaneously compute the inner product of each row of sparse matrix and a vector. In addition, the kernel was optimized to improve the performance by using both parallel reduction and memory coalescing. In the kernel construction, the effect of warp on the parallel performance of the present CUDA was also examined. The present GPU computation was more than 7 times faster than the single CPU by double precision.

Analysis of the CPU/GPU Temperature and Energy Efficiency depending on Executed Applications (응용프로그램 실행에 따른 CPU/GPU의 온도 및 컴퓨터 시스템의 에너지 효율성 분석)

  • Choi, Hong-Jun;Kang, Seung-Gu;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.5
    • /
    • pp.9-19
    • /
    • 2012
  • As the clock frequency increases, CPU performance improves continuously. However, power and thermal problems in the CPU become more serious as the clock frequency increases. For this reason, utilizing the GPU to reduce the workload of the CPU becomes one of the most popular methods in recent high-performance computer systems. The GPU is a specialized processor originally designed for graphics processing. Recently, the technologies such as CUDA which utilize the GPU resources more easily become popular, leading to the improved performance of the computer system by utilizing the CPU and GPU simultaneously in executing various kinds of applications. In this work, we analyze the temperature and the energy efficiency of the computer system where the CPU and the GPU are utilized simultaneously, to figure out the possible problems in upcoming high-performance computer systems. According to our experimentation results, the temperature of both CPU and GPU increase when the application is executed on the GPU. When the application is executed on the CPU, CPU temperature increases whereas GPU temperature remains unchanged. The computer system shows better energy efficiency by utilizing the GPU compared to the CPU, because the throughput of the GPU is much higher than that of the CPU. However, the temperature of the system tends to be increased more easily when the application is executed on the GPU, because the GPU consumes more power than the CPU.

Multiple Camera Based Imaging System with Wide-view and High Resolution and Real-time Image Registration Algorithm (다중 카메라 기반 대영역 고해상도 영상획득 시스템과 실시간 영상 정합 알고리즘)

  • Lee, Seung-Hyun;Kim, Min-Young
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.49 no.4
    • /
    • pp.10-16
    • /
    • 2012
  • For high speed visual inspection in semiconductor industries, it is essential to acquire two-dimensional images on regions of interests with a large field of view (FOV) and a high resolution simultaneously. In this paper, an imaging system is newly proposed to achieve high quality image in terms of precision and FOV, which is composed of single lens, a beam splitter, two camera sensors, and stereo image grabbing board. For simultaneously acquired object images from two camera sensors, Zhang's camera calibration method is applied to calibrate each camera first of all. Secondly, to find a mathematical mapping function between two images acquired from different view cameras, the matching matrix from multiview camera geometry is calculated based on their image homography. Through the image homography, two images are finally registered to secure a large inspection FOV. Here the inspection system of using multiple images from multiple cameras need very fast processing unit for real-time image matching. For this purpose, parallel processing hardware and software are utilized, such as Compute Unified Device Architecture (CUDA). As a result, we can obtain a matched image from two separated images in real-time. Finally, the acquired homography is evaluated in term of accuracy through a series of experiments, and the obtained results shows the effectiveness of the proposed system and method.

Fast Generation of Intermediate View Image Using GPGPU-Based Disparity Increment Method (GPGPU 기반의 변위증분 방법을 이용한 중간시점 고속 생성)

  • Koo, Ja-Myung;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.8
    • /
    • pp.1908-1918
    • /
    • 2013
  • Free-view, auto-stereoscopic video service is a next generation broadcasting system which offers a three-dimensional video, images of the various point are needed. This paper proposes a method that parallelizes the algorithm for arbitrary intermediate view-point image fast generation and make it faster using General Propose Graphic Processing Unit(GPGPU) with help of the Compute Unified Device Architecture(CUDA). It uses a parallelized stereo-matching method between the leftmost and the rightmost depth images to obtain disparity information and It use data calculated disparity increment per depth value. The disparity increment is used to find the location in the intermediate view-point image for each depth in the given images. Then, It is eliminate to disocclusions complement each other and remaining holes are filled image using hole-filling method and to get the final intermediate view-point image. The proposed method was implemented and applied to several test sequences. The results revealed that the quality of the generated intermediate view-point image corresponds to 30.47dB of PSNR in average and it takes about 38 frames per second to generate a Full HD intermediate view-point image.