• Title/Summary/Keyword: GPU 가속기법

Search Result 34, Processing Time 0.024 seconds

GPU Accelerating Methods for Pease FFT Processing (Pease FFT 처리를 위한 GPU 가속 기법)

  • Oh, Se-Chang;Joo, Young-Bok;Kwon, Oh-Young;Huh, Kyung-Moo
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.20 no.1
    • /
    • pp.37-41
    • /
    • 2014
  • FFT (Fast Fourier Transform) has been widely used in various fields such as image processing, voice processing, physics, astronomy, applied mathematics and so forth. Much research has been conducted due to the importance of the FFT and recently new FFT algorithms using a GPU (Graphics Processing Unit) have been developed for the purpose of much faster processing. In this paper, the new optimal FFT algorithm using the Pease FFT algorithm has been proposed reflecting the hardware configuration of a GPGPU (General Purpose computing of GPU). According to the experiments, the proposed algorithm outperformed by between 3% to 43% compared to the CUFFT algorithm.

Development of nearshore sediment transport numerical model based on GPU engine (GPU 엔진 기반 연안의 실시간 유사이송 수치모형 개발)

  • Noh, Junsu;Son, Sangyoung
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.177-177
    • /
    • 2022
  • 기후변화 및 해안 구조물의 증가 등 여러 원인이 연안침식 및 해안선 변화와 같은 연안의 지형변화를 가속하고 있다. 빠르게 변화하는 연안의 지형변화예측 및 대응책 강구를 위해서는 연안의 유사이송 현상에 대한 신속한 예측이 필요하다. 본 연구에서는 GPU 엔진 기반 파랑해석모형인 Celeris Advent를 활용하여 실시간으로 연안의 유사이송 모의가 가능한 수치모형을 개발하였다. Celeris Advent는 GPU의 병렬코어를 활용해 실시간 연산과 GUI를 통한 사용자와의 실시간 상호작용이 가능한 모형이다. 지배방정식은 확장형 Boussinesq 방정식에 유사이송방정식을 양방향 결합하여 구성하였고, 지배방정식에는 하이브리드 유한체적-유한차분 수치기법을 적용하여 이송항은 유한체적법(Kurganov & Petrova, 2007), 소스항은 유한차분법을 통해 이산화하여 해석한다. 유사이송방정식은 수심적분형 이송확산방정식에 침식 및 퇴적 플럭스를 반영하는 소스항을 결합하여, 이송항 및 확산항을 통해 유사의 이송/확산을 고려함과 동시에 소스항을 통해 하상과의 상호작용을 고려하였다.

  • PDF

GPU-accelerated Global Illumination for Point Set Rendering (GPU 가속을 이용한 점집합 렌더링을 위한 전역 조명기법)

  • Min, Heajung;Kim, Young J.
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.1
    • /
    • pp.7-15
    • /
    • 2020
  • In the process of visualizing a point set representing a smooth manifold surface, global illumination techniques can be used to render a realistic scene with various effects of lighting. Thanks to the continuous demand for ray tracing and the development of graphics hardware, dedicated GPUs and programmable pipeline for ray tracing have been introduced in recent years. In this paper, real-time global illumination rendering is studied for a point-set model using ray-tracing GPUs. We apply the moving least-squares (MLS) method to approximate the point set to a smooth implicit surface and render it using global illumination by performing massive ray-intersection tests with the surface and generating shading effects at the intersection point. As a result, a complicated point-set scene consisting of more than 0.5M points can be generated in real-time.

Non-Photorealistic Rendering using GPU Programming Technique (GPU 프로그래밍 기법을 이용한 비사실적 랜더링)

  • Bat-Ochir, Bolormaa;Sung, Kyung;Kim, Soo-Kyun
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.6
    • /
    • pp.1228-1233
    • /
    • 2011
  • NPR(Non-Photorealistic rendering) technique is developing by every years. NPR is inspired on artistic styles, which is painting, drawing, technical illustration, animation and cartoon. There have many application programs for NPR, which is popular and useful of animations, even on game industrial. In traditional computer graphics focused on non-photorealism, but this method need much more memory and time. Recent years, Many NPR methods present advanced rendering technique and real time technique using graphic accelerator. This paper propose to explain NPR with GPU programming.

A Study on GPGPU Performance Improvement Technique on GCN Architecture Using OpenCL API (GCN 아키텍쳐 상에서의 OpenCL을 이용한 GPGPU 성능향상 기법 연구)

  • Woo, DongHee;Kim, YoonHo
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.1
    • /
    • pp.37-45
    • /
    • 2018
  • The current system upon which a variety of programs are in operation has continuously expanded its domain from conventional single-core and multi-core system to many-core and heterogeneous system. However, existing researches have focused mostly on parallelizing programs based CUDA framework and rarely on AMD based GCN-GPU optimization. In light of the aforementioned problems, our study focuses on the optimization techniques of the GCN architecture in a GPGPU environment and achieves a performance improvement. Specifically, by using performance techniques we propose, we have reduced more then 30% of the computation time of matrix multiplication and convolution algorithm in GPGPU. Also, we increase the kernel throughput by more then 40%.

Performance Analysis and Enhancing Techniques of Kd-Tree Traversal Methods on GPU (GPU용 Kd-트리 탐색 방법의 성능 분석 및 향상 기법)

  • Chang, Byung-Joon;Ihm, In-Sung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.2
    • /
    • pp.177-185
    • /
    • 2010
  • Ray-object intersection is an important element in ray tracing that takes up a substantial amount of computing time. In general, such spatial data structure as kd-tree has been frequently used for static scenes to accelerate the intersection computation. Recently, a few variants of kd-tree traversal have been proposed suitable for the GPU that has a relatively restricted computing architecture compared to the CPU. In this article, we propose yet another two implementation techniques that can improve those previous ones. First, we present a cached stack method that is aimed to reduce the costly global memory access time needed when the stack is allocated to global memory. Secondly, we present a rope-with-short-stack method that eases the substantial memory requirement, often necessary for the previous rope method. In order to show the effectiveness of our techniques, we compare their performances with those of the previous GPU traversal methods. The experimental results will provide prospective GPU ray tracer developers with valuable information, helping them choose a proper kd-tree traversal method.

FAST FACE RECOGNITION ON GPUS (GPU 를 통한 얼굴인식 가속화)

  • Yi, Cheong-Yong;Yi, Young-Min
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.10-12
    • /
    • 2012
  • 얼굴인식은 보안 등 다수의 응용분야에서 중요하게 이용되는데, 얼굴인식을 위한 학습은 많은 계산시간이 소요되기 때문에 신속한 학습이 필요한 경우 가속화가 필요하다. 한편, 그래픽스 프로세서 유닛(GPU)은 대용량 정보처리를 빠르게 수행할 수 있어 최근 폭넓은 분야에서 널리 이용되고 있다. 본 논문에서는 주성분 기반의 얼굴인식 알고리즘을 GPU 에서 병렬 수행하여 가속하는 기법을 제안하였다. 주성분 기반의 얼굴인식 각각의 과정들의 병렬성을 분석하여 가속화 이득을 최대하였고, C/OpenCV[2]로 구현된 순차적인 버전[3]과 비교했을 때, 전체 학습시스템에서 최대 약 40 배의 성능이득을 얻었다.

Accelerating Scanline Block Gibbs Sampling Method using GPU (GPU 를 활용한 스캔라인 블록 Gibbs 샘플링 기법의 가속)

  • Zeng, Dongmeng;Kim, Wonsik;Yang, Yong;Park, In Kyu
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2014.06a
    • /
    • pp.77-78
    • /
    • 2014
  • A new MCMC method for optimization is presented in this paper, which is called the scanline block Gibbs sampler. Due to its slow convergence speed, traditional Markov chain Monte Carlo (MCMC) is not widely used. In contrast to the conventional MCMC method, it is more convenient to parallelize the scanline block Gibbs sampler. Since The main part of the scanline block Gibbs sampler is to calculate message between each edge, in order to accelerate the calculation of messages passing in scanline sampler, it is parallelized in GPU. It is proved that the implementation on GPU is faster than on CPU based on the experiments on the OpenGM2 benchmark.

  • PDF

Min-Max Octree Generation Using CUDA (CUDA를 이용한 최대-최소 8진트리 생성 기법)

  • Lim, Jong-Hyeon;Shin, Byeong-Seok
    • Journal of Korea Game Society
    • /
    • v.9 no.6
    • /
    • pp.191-196
    • /
    • 2009
  • Volume rendering is a method which extracts meaningful information from volume data and visualizes those information. In general, since the size of volume data gets larger, it is very important to devise acceleration methods for interactive rendering speed. Min-max octree is data structure for high-speed volume rendering, however, its creation time becomes long as the data size increases. In this paper, we propose acceleration method of min-max octree generation using CUDA. Firstly, we convert one-dimensional array from volume data using space filling curve. Then we make min-max octree structures from the sequential array and apply them to acceleration of volume ray casting.

  • PDF

Precise Sweep Volume Computation Accelerated by GPU (GPU 가속을 이용한 정밀밀한 스웹 볼륨 경계 계산)

  • Lee, Hyunho;Kyung, Minho
    • Journal of the Korea Computer Graphics Society
    • /
    • v.21 no.1
    • /
    • pp.13-21
    • /
    • 2015
  • We present a robust GPU algorithm constructing a sweep volume boundary for a triangular mesh model. Sweeping geometric entities of a triangular mesh object is first approximated to a set of triangles, the envelope of which becomes the outer boundary of the sweep volume. We find the envelope by computing the arrangement of the triangle set and extracting its outmost boundary. To ensure robustness of the algorithm, we adopt random perturbation of sweep vertices and the interval arithmetic using multi-level precisions. The algorithm is implemented to perform most computation on GPU, and as a result it runs two orders of magnitude faster than other algorithms.