• Title/Summary/Keyword: GPU programming

Search Result 60, Processing Time 0.029 seconds

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

  • Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.275-284
    • /
    • 2010
  • GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.

Acceleration of Feature-Based Image Morphing Using GPU (GPU를 이용한 특징 기반 영상모핑의 가속화)

  • Kim, Eun-Ji;Yoon, Seung-Hyun;Lee, Jieun
    • Journal of the Korea Computer Graphics Society
    • /
    • v.20 no.2
    • /
    • pp.13-24
    • /
    • 2014
  • In this study, a graphics-processing-unit (GPU)-based acceleration technique is proposed for the feature-based image morphing. This technique uses the depth-buffer of the graphics hardware to calculate efficiently the shortest distance between a pixel and the control lines. The pairs of control lines between the source image and the destination image are determined by user's input, and the distance function of each control line is rendered using two rectangles and two cones. The distance between each pixel and its nearest control line is stored in the depth buffer through the graphics pipeline, and this is used to conduct the morphing operation efficiently. The pixel-unit morphing operation is parallelized using the compute unified device architecture (CUDA) to reduce the morphing time. We demonstrate the efficiency of the proposed technique using several experimental results.

All Phase Discrete Sine Biorthogonal Transform and Its Application in JPEG-like Image Coding Using GPU

  • Shan, Rongyang;Zhou, Xiao;Wang, Chengyou;Jiang, Baochen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.9
    • /
    • pp.4467-4486
    • /
    • 2016
  • Discrete cosine transform (DCT) based JPEG standard significantly improves the coding efficiency of image compression, but it is unacceptable event in serious blocking artifacts at low bit rate and low efficiency of high-definition image. In the light of all phase digital filtering theory, this paper proposes a novel transform based on discrete sine transform (DST), which is called all phase discrete sine biorthogonal transform (APDSBT). Applying APDSBT to JPEG scheme, the blocking artifacts are reduced significantly. The reconstructed image of APDSBT-JPEG is better than that of DCT-JPEG in terms of objective quality and subjective effect. For improving the efficiency of JPEG coding, the structure of JPEG is analyzed. We analyze key factors in design and evaluation of JPEG compression on the massive parallel graphics processing units (GPUs) using the compute unified device architecture (CUDA) programming model. Experimental results show that the maximum speedup ratio of parallel algorithm of APDSBT-JPEG can reach more than 100 times with a very low version GPU. Some new parallel strategies are illustrated in this paper for improving the performance of parallel algorithm. With the optimal strategy, the efficiency can be improved over 10%.

Approximating the Convex Hull for a Set of Spheres (구 집합에 대한 컨벡스헐 근사)

  • Kim, Byungjoo;Kim, Ku-Jin;Kim, Young J.
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.3 no.1
    • /
    • pp.1-6
    • /
    • 2014
  • Most of the previous algorithms focus on computing the convex hull for a set of points. In this paper, we present a method for approximating the convex hull for a set of spheres with various radii in discrete space. Computing the convex hull for a set of spheres is a base technology for many applications that study structural properties of molecules. We present a voxel map data structures, where the molecule is represented as a set of spheres, and corresponding algorithms. Based on CUDA programming for using the parallel architecture of GPU, our algorithm takes less than 40ms for computing the convex hull of 6,400 spheres in average.

Enhancement of H.264/AVC Encoding Speed and Reduction of CPU Load through Parallel Programming Based on CUDA (CUDA 기반의 병렬 프로그래밍을 통한 H.264/AVC 부호화 속도 향상 및 CPU 부하 경감)

  • Jang, Eun-Been;Ha, Yun-Su
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.34 no.6
    • /
    • pp.858-863
    • /
    • 2010
  • In order to enhance encoding speed in dynamic image encoding using H.264/AVC, reducing the time for motion estimation which takes a large portion of the processing time is very important. An approach using graphics processing unit(GPU) as a coprocessor to assist the central processing unit(CPU) in computing massive data, will be a way to reduce the processing time. In this paper, we present an efficient block-level parallel algorithm for the motion estimation(ME) on a computer unified device architecture(CUDA) platform developed in general-purpose computation on GPU. Experiments are carried out to verify the effectiveness of the proposed algorithm.

Surface Detailed Painterly Rendering Using Heightfield Map (하이트필드 맵을 이용한 회화적 질감 표현)

  • Ryoo, Seung-Taek
    • Journal of the Korea Computer Graphics Society
    • /
    • v.12 no.4
    • /
    • pp.1-5
    • /
    • 2006
  • This paper introduces the surface detailed painterly rendering using heightfield map. To do this, we implement painterly rendering using normal mapping and displacement mapping method by heightfield map. The suggested method can apply to the 3D visualization program and game engine for representing the surface detailed realtime rendering using GPU Programming.

  • PDF

Design and Implementation of High-Resolution Integral Imaging Display System using Expanded Depth Image

  • Song, Min-Ho;Lim, Byung-Muk;Ryu, Ga-A;Ha, Jong-Sung;Yoo, Kwan-Hee
    • International Journal of Contents
    • /
    • v.14 no.3
    • /
    • pp.1-6
    • /
    • 2018
  • For 3D display applications, auto-stereoscopic display methods that can provide 3D images without glasses have been actively developed. This paper is concerned with developing a display system for elemental images of real space using integral imaging. Unlike the conventional method, which reduces a color image to the level as much as a generated depth image does, we have minimized original color image data loss by generating an enlarged depth image with interpolation methods. Our method was efficiently implemented by applying a GPU parallel processing technique with OpenCL to rapidly generate a large amount of elemental image data. We also obtained experimental results for displaying higher quality integral imaging rather than one generated by previous methods.

Low-power Scheduling Framework for Heterogeneous Architecture under Performance Constraint

  • Li, Junke;Guo, Bing;Shen, Yan;Li, Deguang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.2003-2021
    • /
    • 2020
  • Today's computer systems are widely integrated with CPU and GPU to achieve considerable performance, but energy consumption of such system directly affects operational cost, maintainability and environmental problem, which has been aroused wide concern by researchers, computer architects, and developers. To cope with energy problem, we propose a task-scheduling framework to reduce energy under performance constraint by rationally allocating the tasks across the CPU and GPU. The framework first collects the estimated energy consumption of programs and performance information. Next, we use above information to formalize the scheduling problem as the 0-1 knapsack problem. Then, we elaborate our experiment on typical platform to verify proposed scheduling framework. The experimental results show that our proposed algorithm saves 14.97% energy compared with that of the time-oriented policy and yields 37.23% performance improvement than that of energy-oriented scheme on average.

GPU-based Acceleration of Image-based Rendering (GPU를 이용한 영상기반 렌더링의 가속)

  • Lee, Man-Hee;Park, In-Kyu
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07a
    • /
    • pp.685-687
    • /
    • 2005
  • 본 논문에서는 깊이 영상기반 3차원 물체(depth image-based 3-D object)의 고속 렌더링 기법을 제안한다. 제안하는 알고리즘은 그래픽 가속기가 지원하는 shader programming 기법을 이용하여 하드웨어 가속을 직접 이용하도록 설계되었다. 또한, 기존의 영상 기반 렌더링의 한계를 극복하여 조명 효과를 표현할 수 있으며 렌더링시 각 화소당 Splat 크기를 하드웨어에서 직접 조절하여 고속 렌더링이 가능하다. 모의 실험결과, 소프트웨어 렌더링 또는 OpenGL 기반의 렌더링에 비해 괄목할 만한 렌더링 속도의 향상이 이루어졌다.

  • PDF

Comparison of GPU-Based Numerous Particles Simulation and Experiment (GPU 기반 대량입자 거동 시뮬레이션과 실험비교)

  • Park, Sang Wook;Jun, Chul Woong;Sohn, Jeong Hyun;Lee, Jae Wook
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.38 no.7
    • /
    • pp.751-756
    • /
    • 2014
  • The dynamic behavior of numerous grains interacting with each other can be easily observed. In this study, this dynamic behavior was analyzed based on the contact between numerous grains. The discrete element method was used for analyzing the dynamic behavior of each particle and the neighboring-cell algorithm was employed for detecting their contact. The Hertzian and tangential sliding friction contact models were used for calculating the contact force acting between the particles. A GPU-based parallel program was developed for conducting the computer simulation and calculating the numerous contacts. The dam break experiment was performed to verify the simulation results. The reliability of the program was verified by comparing the results of the simulation with those of the experiment.