• Title/Summary/Keyword: 병렬 GPU

Search Result 312, Processing Time 0.052 seconds

Assessment of Parallel Computing Performance of Agisoft Metashape for Orthomosaic Generation (정사모자이크 제작을 위한 Agisoft Metashape의 병렬처리 성능 평가)

  • Han, Soohee;Hong, Chang-Ki
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.427-434
    • /
    • 2019
  • In the present study, we assessed the parallel computing performance of Agisoft Metashape for orthomosaic generation, which can implement aerial triangulation, generate a three-dimensional point cloud, and make an orthomosaic based on SfM (Structure from Motion) technology. Due to the nature of SfM, most of the time is spent on Align photos, which runs as a relative orientation, and Build dense cloud, which generates a three-dimensional point cloud. Metashape can parallelize the two processes by using multi-cores of CPU (Central Processing Unit) and GPU (Graphics Processing Unit). An orthomosaic was created from large UAV (Unmanned Aerial Vehicle) images by six conditions combined by three parallel methods (CPU only, GPU only, and CPU + GPU) and two operating systems (Windows and Linux). To assess the consistency of the results of the conditions, RMSE (Root Mean Square Error) of aerial triangulation was measured using ground control points which were automatically detected on the images without human intervention. The results of orthomosaic generation from 521 UAV images of 42.2 million pixels showed that the combination of CPU and GPU showed the best performance using the present system, and Linux showed better performance than Windows in all conditions. However, the RMSE values of aerial triangulation revealed a slight difference within an error range among the combinations. Therefore, Metashape seems to leave things to be desired so that the consistency is obtained regardless of parallel methods and operating systems.

Parallel Range Query processing on R-tree with Graphics Processing Units (GPU를 이용한 R-tree에서의 범위 질의의 병렬 처리)

  • Yu, Bo-Seon;Kim, Hyun-Duk;Choi, Won-Ik;Kwon, Dong-Seop
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.5
    • /
    • pp.669-680
    • /
    • 2011
  • R-trees are widely used in various areas such as geographical information systems, CAD systems and spatial databases in order to efficiently index multi-dimensional data. As data sets used in these areas grow in size and complexity, however, range query operations on R-tree are needed to be further faster to meet the area-specific constraints. To address this problem, there have been various research efforts to develop strategies for acceleration query processing on R-tree by using the buffer mechanism or parallelizing the query processing on R-tree through multiple disks and processors. As a part of the strategies, approaches which parallelize query processing on R-tree through Graphics Processor Units(GPUs) have been explored. The use of GPUs may guarantee improved performances resulting from faster calculations and reduced disk accesses but may cause additional overhead costs caused by high memory access latencies and low data exchange rate between GPUs and the CPU. In this paper, to address the overhead problems and to adapt GPUs efficiently, we propose a novel approach which uses a GPU as a buffer to parallelize query processing on R-tree. The use of buffer algorithm can give improved performance by reducing the number of disk access and maximizing coalesced memory access resulting in minimizing GPU memory access latencies. Through the extensive performance studies, we observed that the proposed approach achieved up to 5 times higher query performance than the original CPU-based R-trees.

스마트폰에서의 영상처리를 위한 GPU 활용

  • Park, In-Gyu;Choe, Ho-Yeol
    • Information and Communications Magazine
    • /
    • v.29 no.4
    • /
    • pp.46-51
    • /
    • 2012
  • 본 기고에서는 최근 스마트폰에서 요구되는 다양한 멀티미디어 어플리케이션을 embedded GPU(Graphics Processing Unit)를 이용하여 고속 병렬처리하기 위한 GPGPU (General-Purpose Computing on GPU) 기술 및 영상처리 분야의 응용 사례를 소개한다. 일반적인 데스크탑 컴퓨팅 환경과 달리 제약사항이 많은 embedded 환경에서의 GPGPU 응용 기술은 아직 초기단계이다. 그러나 급격히 발전하는 embedded GPU IP와 OpenCL과 같은 API의 등장으로 embedded GPU를 이용한 고속 병렬처리 환경이 수 년 이내에 일반화 될 것이다. 본 기고에서는 그 가능성을 점검하기 위하여 embedded GPU에서의 영상처리를 위한 최신 하드웨어와 소프트웨어 환경의 발전 동향을 소개한다. 더불어 최신 스마트폰에서의 GPGPU기술을 사용한 영상처리 사례와 영상처리 알고리즘의 GPGPU 알고리즘 구현시 고려해야 할 주요 사항을 정리한다.

Design of Line Scratch Detection and Restoration Algorithm using GPU (GPU를 이용한 선형 스크래치 탐지와 복원 알고리즘의 설계)

  • Lee, Joon-Goo;Shim, She-Yong;You, Byoung-Moon;Hwang, Doo-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.4
    • /
    • pp.9-16
    • /
    • 2014
  • This paper proposes a linear scratch detection and restoration algorithm using pixel data comparison in a single frame or consecutive frames. There exists a high parallelism in that a scratch detection and restoration algorithm needs a large amount of comparison operations. The proposed scratch detection and restoration algorithm is designed with a GPU for fast computation. We test the proposed algorithm in sequential and parallel processing with the set of digital videos in National Archive of Korea. In the experiments, the scratch detection rate of consecutive frames is as fast as about 20% for that of a single frame. The detection and restoration rates of a GPU-based algorithm are similar to those of a CPU-based algorithm, but the parallel implementation speeds up to about 50 times.

Efficient Collaboration Method Between CPU and GPU for Generating All Possible Cases in Combination (조합에서 모든 경우의 수를 만들기 위한 CPU와 GPU의 효율적 협업 방법)

  • Son, Ki-Bong;Son, Min-Young;Kim, Young-Hak
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.9
    • /
    • pp.219-226
    • /
    • 2018
  • One of the systematic ways to generate the number of all cases is a combination to construct a combination tree, and its time complexity is O($2^n$). A combination tree is used for various purposes such as the graph homogeneity problem, the initial model for calculating frequent item sets, and so on. However, algorithms that must search the number of all cases of a combination are difficult to use realistically due to high time complexity. Nevertheless, as the amount of data becomes large and various studies are being carried out to utilize the data, the number of cases of searching all cases is increasing. Recently, as the GPU environment becomes popular and can be easily accessed, various attempts have been made to reduce time by parallelizing algorithms having high time complexity in a serial environment. Because the method of generating the number of all cases in combination is sequential and the size of sub-task is biased, it is not suitable for parallel implementation. The efficiency of parallel algorithms can be maximized when all threads have tasks with similar size. In this paper, we propose a method to efficiently collaborate between CPU and GPU to parallelize the problem of finding the number of all cases. In order to evaluate the performance of the proposed algorithm, we analyze the time complexity in the theoretical aspect, and compare the experimental time of the proposed algorithm with other algorithms in CPU and GPU environment. Experimental results show that the proposed CPU and GPU collaboration algorithm maintains a balance between the execution time of the CPU and GPU compared to the previous algorithms, and the execution time is improved remarkable as the number of elements increases.

Multi GPU Based Image Registration for Cerebrovascular Extraction and Interactive Visualization (뇌혈관 추출과 대화형 가시화를 위한 다중 GPU기반 영상정합)

  • Park, Seong-Jin;Shin, Yeong-Gil
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.6
    • /
    • pp.445-449
    • /
    • 2009
  • In this paper, we propose a computationally efficient multi GPU accelerated image registration technique to correct the motion difference between the pre-contrast CT image and post-contrast CTA image. Our method consists of two steps: multi GPU based image registration and a cerebrovascular visualization. At first, it computes a similarity measure considering the parallelism between both GPUs as well as the parallelism inside GPU for performing the voxel-based registration. Then, it subtracts a CT image transformed by optimal transformation matrix from CTA image, and visualizes the subtracted volume using GPU based volume rendering technique. In this paper, we compare our proposed method with existing methods using 5 pairs of pre-contrast brain CT image and post-contrast brain CTA image in order to prove the superiority of our method in regard to visual quality and computational time. Experimental results show that our method well visualizes a brain vessel, so it well diagnose a vessel disease. Our multi GPU based approach is 11.6 times faster than CPU based approach and 1.4 times faster than single GPU based approach for total processing.

Parallel Algorithm of Conjugate Gradient Solver using OpenGL Compute Shader

  • Va, Hongly;Lee, Do-keyong;Hong, Min
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.1-9
    • /
    • 2021
  • OpenGL compute shader is a shader stage that operate differently from other shader stage and it can be used for the calculating purpose of any data in parallel. This paper proposes a GPU-based parallel algorithm for computing sparse linear systems through conjugate gradient using an iterative method, which perform calculation on OpenGL compute shader. Basically, this sparse linear solver is used to solve large linear systems such as symmetric positive definite matrix. Four well-known matrix formats (Dense, COO, ELL and CSR) have been used for matrix storage. The performance comparison from our experimental tests using eight sparse matrices shows that GPU-based linear solving system much faster than CPU-based linear solving system with the best average computing time 0.64ms in GPU-based and 15.37ms in CPU-based.

Parallel Computation of FDTD algorithm using CUDA (CUDA를 이용한 FDTD 알고리즘의 병렬처리)

  • Lee, Ho-Young;Park, Jong-Hyun;Kim, Jun-Seong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.82-87
    • /
    • 2010
  • Modern GPUs(Graphic Processing Units) provide computing capability higher than that of the general CPUs(Central Processor Units). With supports of programmability of graphics pipeline GP-GPU(General Purpose computation on GPU) has gained much attention expanding its application area. This paper compares sequential and massively parallel implementations of FDTD(Finite Difference Time Domain) algorithm using CUDA(Compute Unified Device Architecture). Experimental results show upto 45X speedup over conventional CPU execution.

A Study on High Performance GPU based Container Cloud System supporting TensorFlow Serving Deployment Service (TensorFlow Serving 서비스를 지원하는 고성능 GPU 기반 컨테이너 클라우드 시스템)

  • Jang, Kyung-Soo;Kim, Jung-Hwan
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.386-388
    • /
    • 2017
  • TensorFlow와 알파고의 등장으로 인공지능의 높은 성능과 다양한 활용 가능성을 보이면서, 폭 넓은 산업 분야에서 머신러닝 기술에 대한 수요가 증가하고 있다. 반면, 머신러닝 기술은 GPU 기반 고속 병렬처리 기술과 인프라 기술을 기반으로 하고 있기 때문에, 머신러닝 기반 서비스 개발 및 제공에 어려움을 겪고 있다. 본 논문에서는 이와 같은 문제를 개선하기 위해서 개발한 고성능 GPU 기반 컨테이너 클라우드 시스템을 소개한다. 해당 시스템은 GPU 기반 고속 병렬처리를 지원하고, Kubernetes 클러스터에서 컨테이너를 기반으로 TensorFlow Serving을 손쉽게 배포할 수 있는 기능을 제공한다.

A Parallel Processing Technique for Large Spatial Data (대용량 공간 데이터를 위한 병렬 처리 기법)

  • Park, Seunghyun;Oh, Byoung-Woo
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.1-9
    • /
    • 2015
  • Graphical processing unit (GPU) contains many arithmetic logic units (ALUs). Because many ALUs can be exploited to process parallel processing, GPU provides efficient data processing. The spatial data require many geographic coordinates to represent the shape of them in a map. The coordinates are usually stored as geodetic longitude and latitude. To display a map in 2-dimensional Cartesian coordinate system, the geodetic longitude and latitude should be converted to the Universal Transverse Mercator (UTM) coordinate system. The conversion to the other coordinate system and the rendering process to represent the converted coordinates to screen use complex floating-point computations. In this paper, we propose a parallel processing technique that processes the conversion and the rendering using the GPU to improve the performance. Large spatial data is stored in the disk on files. To process the large amount of spatial data efficiently, we propose a technique that merges the spatial data files to a large file and access the file with the method of memory mapped file. We implement the proposed technique and perform the experiment with the 747,302,971 points of the TIGER/Line spatial data. The result of the experiment is that the conversion time for the coordinate systems with the GPU is 30.16 times faster than the CPU only method and the rendering time is 80.40 times faster than the CPU.