• Title/Summary/Keyword: GPU Acceleration

Search Result 76, Processing Time 0.028 seconds

Acceleration of GPU-based Volume Rendering Using Vertex Splitting (정점분할을 이용한 GPU 기반 볼륨 렌더링의 가속 기법)

  • Yoo, Seong-Yeol;Lee, Eun-Seok;Shin, Byeong-Seok
    • Journal of Korea Game Society
    • /
    • v.12 no.2
    • /
    • pp.53-62
    • /
    • 2012
  • Visualizing a volume dataset with ray-casting which of visualization methods provides high quality image. However it spends too much time for rendering because the size of volume data are huge. Recently, various researches have been proposed to accelerate GPU-based volume rendering to solve these problems. In this paper, we propose an efficient GPU-based empty space skipping to accelerate volume ray-casting using octree traversal. This method creates min-max octree and searches empty space using vertex splitting. It minimizes the bounding polyhedron by eliminating empty space found in the octree traveral step. The rendering results of our method are identical to those of previous GPU-based volume ray-casting, with the advantage of faster run-time because of using minimized bounding polyhedron.

Research of accelerating method of video quality measurement program using GPGPU (GPGPU를 이용한 영상 품질 측정 프로그램의 가속화 연구)

  • Lee, Seonguk;Byeon, Gibeom;Kim, Kisu;Hong, Jiman
    • Smart Media Journal
    • /
    • v.5 no.4
    • /
    • pp.69-74
    • /
    • 2016
  • Recently, parallel computing using GPGPU(General-Purpose computing on Graphics Processing Units) according to the development of the graphics processing unit is expanding. This can be achieved through the processing speeds faster than traditional computing environments across many fields, including science, medicine, engineering, and analysis. However, in using the GPU technology to implement the a parallel program there are many constraints. In this paper, we port a CPU-based program(Video Quality Measurement Program) to use technology. The program ported to GPU-based show about 1.83 times the execution speed than CPU-based program. We study on the acceleration of the GPU-based program. Also we discuss the technical constraints and problems that occur when you modify the CPU to the GPU-based programs.

The Implementation of Fast Object Recognition Using Parallel Processing on CPU and GPU (CPU와 GPU의 병렬 처리를 이용한 고속 물체 인식 알고리즘 구현)

  • Kim, Jun-Chul;Jung, Young-Han;Park, Eun-Soo;Cui, Xue-Nan;Kim, Hak-Il;Huh, Uk-Youl
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.15 no.5
    • /
    • pp.488-495
    • /
    • 2009
  • This paper presents a fast feature extraction method for autonomous mobile robots utilizing parallel processing and based on OpenMP, SSE (Streaming SIMD Extension) and CUDA programming. In the first step on CPU version, the algorithms and codes are optimized and then implemented by parallel processing. The parallel algorithms are debugged to maintain the same level of performance and the process for extracting key points and obtaining dominant orientation with respect to key points is parallelized. After extraction, a parallel descriptor via SSE instructions is constructed. And the GPU version also implemented by parallel processing using CUDA based on the SIFT. The GPU-Parallel descriptor achieves an acceleration up to five times compared with the CPU-Parallel descriptor, but it shows the lower performance than CPU version. CPU version also speed-up the four and half times compared with the original SIFT while maintaining robust performance.

Acceleration of Phase Measuring Profilometry using GPU (GPU를 이용한 위상 측정법의 가속화)

  • Kim, Ho-Joong;Cho, Tai-Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.12
    • /
    • pp.2285-2290
    • /
    • 2017
  • Automation systems are evolving in many areas of industry in recent years. At the same time, the necessity of the height inspection of the object by the 3D measurement is gradually increasing. Among the various 3D measurement methods, this paper discusses phase measuring profilometry(PMP). The PMP is a method of obtaining the height of an object using the phase value of the fringe pattern. Since the PMP is an algorithm requiring a large amount of computation, a method for efficiently solving the problem is needed. In this paper, we propose to use CUDA from NVIDIA to solve this problem. We also propose using pinned memory and streams provided by CUDA. This can greatly improve the measurement speed while maintaining accuracy. Finally, we demonstrate the performance of the proposed method through experiments.

Real-time Color Recognition Based on Graphic Hardware Acceleration (그래픽 하드웨어 가속을 이용한 실시간 색상 인식)

  • Kim, Ku-Jin;Yoon, Ji-Young;Choi, Yoo-Joo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.1
    • /
    • pp.1-12
    • /
    • 2008
  • In this paper, we present a real-time algorithm for recognizing the vehicle color from the indoor and outdoor vehicle images based on GPU (Graphics Processing Unit) acceleration. In the preprocessing step, we construct feature victors from the sample vehicle images with different colors. Then, we combine the feature vectors for each color and store them as a reference texture that would be used in the GPU. Given an input vehicle image, the CPU constructs its feature Hector, and then the GPU compares it with the sample feature vectors in the reference texture. The similarities between the input feature vector and the sample feature vectors for each color are measured, and then the result is transferred to the CPU to recognize the vehicle color. The output colors are categorized into seven colors that include three achromatic colors: black, silver, and white and four chromatic colors: red, yellow, blue, and green. We construct feature vectors by using the histograms which consist of hue-saturation pairs and hue-intensity pairs. The weight factor is given to the saturation values. Our algorithm shows 94.67% of successful color recognition rate, by using a large number of sample images captured in various environments, by generating feature vectors that distinguish different colors, and by utilizing an appropriate likelihood function. We also accelerate the speed of color recognition by utilizing the parallel computation functionality in the GPU. In the experiments, we constructed a reference texture from 7,168 sample images, where 1,024 images were used for each color. The average time for generating a feature vector is 0.509ms for the $150{\times}113$ resolution image. After the feature vector is constructed, the execution time for GPU-based color recognition is 2.316ms in average, and this is 5.47 times faster than the case when the algorithm is executed in the CPU. Our experiments were limited to the vehicle images only, but our algorithm can be extended to the input images of the general objects.

Real-time Volume Rendering using Point-Primitive (포인트 프리미티브를 이용한 실시간 볼륨 렌더링 기법)

  • Kang, Dong-Soo;Shin, Byeong-Seok
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.10
    • /
    • pp.1229-1237
    • /
    • 2011
  • The volume ray-casting method is one of the direct volume rendering methods that produces high-quality images as well as manipulates semi-transparent object. Although the volume ray-casting method produces high-quality image by sampling in the region of interest, its rendering speed is slow since the color acquisition process is complicated for repetitive memory reference and accumulation of sample values. Recently, the GPU-based acceleration techniques are introduced. However, they require pre-processing or additional memory. In this paper, we propose efficient point-primitive based method to overcome complicated computation of GPU ray-casting. It presents semi-transparent objects, however it does not require preprocessing and additional memory. Our method is fast since it generates point-primitives from volume dataset during sampling process and it projects the primitives onto the image plane. Also, our method can easily cope with OTF change because we can add or delete point-primitive in real-time.

Accelerated Large-Scale Simulation on DEVS based Hybrid System using Collaborative Computation on Multi-Cores and GPUs (멀티 코어와 GPU 결합 구조를 이용한 DEVS 기반 대규모 하이브리드 시스템 모델링 시뮬레이션의 가속화)

  • Kim, Seongseop;Cho, Jeonghun;Park, Daejin
    • Journal of the Korea Society for Simulation
    • /
    • v.27 no.3
    • /
    • pp.1-11
    • /
    • 2018
  • Discrete event system specification (DEVS) has been used in many simulations including hybrid systems featuring both discrete and continuous behavior that require a lot of time to get results. Therefore, in this study, we proposed the acceleration of a DEVS-based hybrid system simulation using multi-cores and GPUs tightly coupled computing. We analyzed the proposed heterogeneous computing of the simulation in terms of the configuration of the target device, changing simulation parameters, and power consumption for efficient simulation. The result revealed that the proposed architecture offers an advantage for high-performance simulation in terms of execution time, although more power consumption is required. With these results, we discovered that our approach is applicable in hybrid system simulation, and we demonstrated the possibility of optimized hardware distribution in terms of power consumption versus execution time via experiments in the proposed architecture.

Parallel Range Query processing on R-tree with Graphics Processing Units (GPU를 이용한 R-tree에서의 범위 질의의 병렬 처리)

  • Yu, Bo-Seon;Kim, Hyun-Duk;Choi, Won-Ik;Kwon, Dong-Seop
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.5
    • /
    • pp.669-680
    • /
    • 2011
  • R-trees are widely used in various areas such as geographical information systems, CAD systems and spatial databases in order to efficiently index multi-dimensional data. As data sets used in these areas grow in size and complexity, however, range query operations on R-tree are needed to be further faster to meet the area-specific constraints. To address this problem, there have been various research efforts to develop strategies for acceleration query processing on R-tree by using the buffer mechanism or parallelizing the query processing on R-tree through multiple disks and processors. As a part of the strategies, approaches which parallelize query processing on R-tree through Graphics Processor Units(GPUs) have been explored. The use of GPUs may guarantee improved performances resulting from faster calculations and reduced disk accesses but may cause additional overhead costs caused by high memory access latencies and low data exchange rate between GPUs and the CPU. In this paper, to address the overhead problems and to adapt GPUs efficiently, we propose a novel approach which uses a GPU as a buffer to parallelize query processing on R-tree. The use of buffer algorithm can give improved performance by reducing the number of disk access and maximizing coalesced memory access resulting in minimizing GPU memory access latencies. Through the extensive performance studies, we observed that the proposed approach achieved up to 5 times higher query performance than the original CPU-based R-trees.

FPGA-Based Acceleration of Range Doppler Algorithm for Real-Time Synthetic Aperture Radar Imaging (실시간 SAR 영상 생성을 위한 Range Doppler 알고리즘의 FPGA 기반 가속화)

  • Jeong, Dongmin;Lee, Wookyung;Jung, Yunho
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.634-643
    • /
    • 2021
  • In this paper, an FPGA-based acceleration scheme of range Doppler algorithm (RDA) is proposed for the real time synthetic aperture radar (SAR) imaging. Hardware architectures of matched filter based on systolic array architecture and a high speed sinc interpolator to compensate range cell migration (RCM) are presented. In addition, the proposed hardware was implemented and accelerated on Xilinx Alveo FPGA. Experimental results for 4096×4096-size SAR imaging showed that FPGA-based implementation achieves 2 times acceleration compared to GPU-based design. It was also confirmed the proposed design can be implemented with 60,247 CLB LUTs, 103,728 CLB registers, 20 block RAM tiles and 592 DPSs at the operating frequency of 312 MHz.

Fast Medical Volume Decompression Using GPGPU (GPGPU를 이용한 고속 의료 볼륨 영상의 압축 복원)

  • Kye, Hee-Won
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.5
    • /
    • pp.624-631
    • /
    • 2012
  • For many medical imaging systems, volume datasets are stored as a compressed form, so that the dataset has to be decompressed before it is visualized. Since the decompression process takes quite a long time, we present an acceleration method for medical volume decompression using GPU. Our method supports that both lossy and lossless compression and progressive refinement is possible to satisfy variable user requirements. Moreover, our decompression method is well parallelized for GPU so that the decompression takes a very short time. Finally, we designed that the decompression and volume rendering work in one framework so that the selective decompression is available. As a result, we gained additional improvement in volume decompression.