• Title/Summary/Keyword: Parallel GPU

Search Result 284, Processing Time 0.036 seconds

Stereo-To-Multiview Conversion System Using FPGA and GPU Device (FPGA와 GPU를 이용한 스테레오/다시점 변환 시스템)

  • Shin, Hong-Chang;Lee, Jinwhan;Lee, Gwangsoon;Hur, Namho
    • Journal of Broadcast Engineering
    • /
    • v.19 no.5
    • /
    • pp.616-626
    • /
    • 2014
  • In this paper, we introduce a real-time stereo-to-multiview conversion system using FPGA and GPU. The system is based on two different devices so that it consists of two major blocks. The first block is a disparity estimation block that is implemented on FPGA. In this block, each disparity map of stereoscopic video is estimated by DP(dynamic programming)-based stereo matching. And then the estimated disparity maps are refined by post-processing. The refined disparity map is transferred to the GPU device through USB 3.0 and PCI-express interfaces. Stereoscopic video is also transferred to the GPU device. These data are used to render arbitrary number of virtual views in next block. In the second block, disparity-based view interpolation is performed to generate virtual multi-view video. As a final step, all generated views have to be re-arranged into a single image at full resolution for presenting on the target autostereoscopic 3D display. All these steps of the second block are performed in parallel on the GPU device.

Implementation of a GPU Cluster System using Inexpensive Graphics Devices (저가의 그래픽스 장치를 이용한 GPU 클러스터 시스템 구현)

  • Lee, Jong-Min;Lee, Jung-Hwa;Kim, Seong-Woo
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.11
    • /
    • pp.1458-1466
    • /
    • 2011
  • Recently the research on GPGPU has been carried out actively as the performance of GPUs has been increased rapidly. In this paper, we propose the system architecture by benchmarking the existing supercomputer architecture for a cost-effective system using GPUs in low-cost graphics devices and implement a GPU cluster system with eight GPUs. We also make the software development environment that is suitable for the GPU cluster system and use it for the performance evaluation by implementing the n-body problem. According to its result, we found that it is efficient to use multiple GPUs when the problem size is large due to its communication cost. In addition, we could calculate up to eight million celestial bodies by applying the method of calculating block by block to mitigate the problem size constraint due to the limited resource in GPUs.

GPU-Optimized BVH and R-Triangle Methods for Rapid Self-Intersection Handling in Fabrics

  • Jong-Hyun Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.8
    • /
    • pp.59-65
    • /
    • 2024
  • In this paper, we present a GPU-based acceleration of computationally intensive self-collision processing in triangular mesh-based cloth simulation. For Compute Unified Device Architecture (CUDA)-based parallel optimization, we propose 1) an efficient way to build, update, and traverse the Bounding Volume Hierarchy (BVH) tree on the GPU, and 2) optimize the Representative-Triangle (R-Triangle) technique on the GPU to minimize primitive collision checking in triangular mesh-based cloth simulations. As a result, the proposed method can handle self-collisions and object collisions of cloth simulation in GPU environment faster and more efficiently than CPU-based algorithms, and experiments on various scenes show that it can achieve simulation results that are 5x to 10x faster. Since the proposed method is optimized for BVH on GPU, it can be easily integrated into various algorithms and fields that utilize BVH.

Analysis of Morton Code Conversion for 32 Bit IEEE 754 Floating Point Variables (IEEE 754 부동 소수점 32비트 float 변수의 Morton Code 변환 분석)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.17 no.3
    • /
    • pp.165-172
    • /
    • 2016
  • Morton codes play important roles in many parallel GPU applications for the nearest neighbor (NN) search in huge data and queries with its applications growing. This paper discusses and analyzes the meaning of Tero Karras's 32-bit 'unsigned int' Morton code algorithm for three-dimensional spatial information in $[0,1]^3$ and its geometric implications. Based on this, this paper proposes 64-bit 'unsigned long long' version of Morton code and compares the results in both CPU vs. GPU and 32-bit vs. 64-bit versions. The proposed GPU algorithm runs around 1000 times faster than the CPU version.

Evaluation of GPU Computing Capacity for All-in-view GNSS SDR Implementation

  • Yun Sub, Choi;Hung Seok, Seo;Young Baek, Kim
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.12 no.1
    • /
    • pp.75-81
    • /
    • 2023
  • In this study, we design an optimized Graphics Processing Unit (GPU)-based GNSS signal processing technique with the goal of designing and implementing a GNSS Software Defined Receiver (SDR) that can operate in real time all-in-view mode under multi-constellation and multi-frequency signal environment. In the proposed structure the correlators of the existing GNSS SDR are processed by the GPU. We designed a memory structure and processing method that can minimize memory access bottlenecks and optimize the GPU memory resource distribution. The designed GNSS SDR can select and operate only the desired GNSS or desired satellite signals by user input. Also, parameters such as the number of quantization bits, sampling rate, and number of signal tracking arms can be selected. The computing capability of the designed GPU-based GNSS SDR was evaluated and it was confirmed that up to 2400 channels can be processed in real time. As a result, the GPU-based GNSS SDR has sufficient performance to operate in real-time all-in-view mode. In future studies, it will be used for more diverse GNSS signal processing and will be applied to multipath effect analysis using more tracking arms.

High-Performance Korean Morphological Analyzer Using the MapReduce Framework on the GPU

  • Cho, Shi-Won;Lee, Dong-Wook
    • Journal of Electrical Engineering and Technology
    • /
    • v.6 no.4
    • /
    • pp.573-579
    • /
    • 2011
  • To meet the scalability and performance requirements of data analyses, which often involve voluminous data, efficient parallel or concurrent algorithms and frameworks are essential. We present a high-performance Korean morphological analyzer which employs the MapReduce framework on the graphics processing unit (GPU). MapReduce is a programming framework introduced by Google to aid the development of web search applications on a large number of central processing units (CPUs). GPUs are designed as a special-purpose co-processor. Their programming interfaces are typically formulated for graphics applications. Compared to CPUs, GPUs have greater computation power and memory bandwidth; however, GPUs are more difficult to program because of the design of their architectures. The performance of the Korean morphological analyzer using the MapReduce framework on the GPU is evaluated in comparison with the CPU-based model. The proposed Korean Morphological analyzer shows promising scalable performance on distributed computing with the GPU.

Fast Depth Map Estimation using Parallel Processing based on GPU (GPU기반 Depth Map 회득을 위한 고속 병렬처리 기법)

  • Jin, Moon-Sub;Choi, Ji-Yoon;Choo, Hyon-Gon;Kim, Jin-Woong;Park, Jong-Il
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.07a
    • /
    • pp.396-398
    • /
    • 2011
  • 본 논문은 두 대의 카메라와 한 대의 프로젝터로 구성된 Pro-cam시스템을 이용하여, 출력된 패턴 영상을 카메라로 촬영하고 이를 기반으로 Depth Map을 계산하는 모듈의 실시간 처리를 위한 GPU기반 병렬처리 기법을 제안한다. 입력받은 영상으로부터 구조광의 패턴을 해석하고, Depth Map을 계산하기 위해서, Dynamic pattern decoding하는 과정은 프로젝터의 패턴영상과 촬영된 카메라 패턴영상 간의 관계를 반복적으로 비교하므로, 이를 GPU 프로그래밍을 이용하여 병렬 처리를 통해 고속화하였다. 결과적으로 본 논문에서는 기존 CPU에서 수행했던 속도에 비해 약 18배정도 속도를 개선 할 수 있었다.

  • PDF

Design of Scratch Detection Algorithm based on GPU (GPU 기반 스크래치 탐지 알고리즘의 설계)

  • Lee, Joon-Goo;Han, Ki-Sun;You, Byoung-Moon;Hwang, Doo-Sung
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2013.07a
    • /
    • pp.9-10
    • /
    • 2013
  • 영상의 스크래치 탐지는 프레임 간 화소 데이터의 비교에 있어서 많은 처리 시간을 필요로 한다. 본 논문은 스크래치 탐지 알고리즘이 GPU에서 수행할 수 있도록 병렬 설계를 제안하고, 국가 기록원 소장 디지털화 영상에 대해 실험하였다. 실험에서 제안하는 방법은 순차적 스크래치 탐지 방법과 비교하여 약 5배의 처리 시간을 단축했으며, 탐지율은 각 방법 모두 60% 정도로 유사함을 보였다.

  • PDF

Acceleration of Range Query in R-tree Using GPU Parallel Processing (GPU를 이용한 R-tree의 질의처리 병렬화)

  • Kim, Min-Cheol;Choi, Won-Ik
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06c
    • /
    • pp.37-40
    • /
    • 2011
  • 계층적 색인 구조는 대용량의 다차원 데이터에 대한 범위질의를 가장 효율적으로 처리하는 색인 구조이다. 계층적 색인 구조에서 범위질의의 속도를 향상시키기 위해서 색인 구조의 구성 시 발생하는 인접노드간의 겹치는 영역을 줄이는 기법들과 다량의 데이터를 한 번에 읽어 상향식 방식으로 색인 구조의 공간 활용도를 증가시키는 벌크 로딩 기법들이 제안되었다. 하지만 CPU기반에서 개별의 노드들을 순차적으로 질의처리 하는 계층적 색인 구조는 공간 활용도의 증가와 노드 간의 중첩 영역을 줄이는 것만으로는 질의 처리 성능 향상에 한계가 있다. 따라서 본 논문에서는 기존의 CPU기반 계층적 색인 구조 중의 대표적인 예인 R-tree의 저장 구조를 GPU 메모리에 적합하도록 변경을 하였다. 또한 기존 CPU기반 계층적 색인 구조의 순차적인 노드 검색을 GPU를 이용해 병렬적으로 노드를 검사하여 성능을 향상시켰다. 이와 같은 방식으로 질의 영역의 크기에 따라서 성능 향상정도가 다르지만 최대 100배 이상의 성능을 향상시켰다.

Accelerating Gaussian Hole-Filling Algorithm using GPU (GPU를 이용한 Gaussian Hole-Filling Algorithm 가속)

  • Park, Jun-Ho;Han, Tack-Don
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2012.07a
    • /
    • pp.79-82
    • /
    • 2012
  • 3차원 멀티미디어 서비스에 대한 관심이 높아짐에 따라 관련 연구들이 현재 다양하게 논의되고 있다. Stereoscopy영상을 생성하기 위한 기존의 방법으로는 두 대의 촬영용 카메라를 일정한 간격으로 띄워놓고 피사체를 촬영한 후 해당 좌시점과 우시점을 생성하는 방법을 이용하였다. 하지만 이는 영상 대역폭의 부담을 가져오게 된다. 이를 해결하기 위하여 Depth정보와 한 장의 영상을 이용한 DIBR(Depth Image Based Rendering) Algorithm에 대한 연구가 많이 이루어지고 있다. 그중 Gaussian Depth Map을 이용한 Hole-Filling 방법은 DIBR에서 가장 자연스러운 결과를 보여주지만 다른 DIBR Algorithm들에 비해 속도가 현저히 느리다는 단점이 있다. 본 논문에서는 영상 생성의 고속화를 위해 GPU를 이용한 Gaussian Hole-Filling Algorithm의 병렬처리 구조를 제안하고 이를 이용한 DIBR Algorithm 생성과정을 제시한다.

  • PDF