• Title/Summary/Keyword: GPU Acceleration

Search Result 76, Processing Time 0.058 seconds

The Recent Trends of Rendering Acceleration Technologies (렌더링 가속화 기술 동향)

  • Nam, Seung-U;Kim, Hae-Dong;Kim, Seong-Su;Choe, Jin-Seong
    • Electronics and Telecommunications Trends
    • /
    • v.22 no.4 s.106
    • /
    • pp.12-23
    • /
    • 2007
  • 컴퓨터 그래픽스를 이용한 디지털 콘텐츠를 제작 및 생산함에 있어서 마지막 단계에서 렌더링 과정을 꼭 거쳐야 하기 때문에 렌더링 부분은 아주 중요하다. 렌더링해야 할 디지털 콘텐츠에는 게임과 같이 실시간성이 아주 중요한 콘텐츠가 있으며, 영화와 같이 영상의 높은 품질을 요구하는 콘텐츠가 있다. 본 고에서는 영화와 같이 고품질을 요구하는 콘텐츠에 대한 렌더링 기술에 대하여 다루고자 한다. 영화의 한 장면과 같이 복잡하며 높은 해상도를 갖는 영상을 기존 단일 CPU 및 소프트웨어 렌더러를 이용하여 렌더링하는 데 아주 많은 시간이 걸린다. 본 고에서는 렌더링 시간을 줄이며 높은 품질의 렌더링 결과를 얻는 기술을 3가지 부분에서 소개하고자 한다. 첫번째 방법에는 수십 개에서 수천 개의 CPU를 이용하거나 PC를 클러스터링하는 방법이고, 두번째는 기존 GPU의 기술이 아주 빨리 발전하여 CPU 보다 빠른 성능을 갖기 때문에 GPU를 활용하여 가속화하는 방법이 있으며, 세번째는 전용 하드웨어를 제작하여 렌더링을 가속하는 방법이 있다. 위의 방법들에 대한 기술 동향에 대하여 살펴보도록 한다.

Trends in AI Processor Technology (인공지능프로세서 기술 동향)

  • Lee, M.Y.;Chung, J.;Lee, J.H.;Han, J.H.;Kwon, Y.S.
    • Electronics and Telecommunications Trends
    • /
    • v.35 no.3
    • /
    • pp.66-75
    • /
    • 2020
  • As the increasing expectations of a practical AI (Artificial Intelligence) service makes AI algorithms more complicated, an efficient processor to process AI algorithms is required. To meet this requirement, processors optimized for parallel processing, such as GPUs (Graphics Processing Units), have been widely employed. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted. This paper briefly introduces an AI processor especially for inference acceleration, developed by the Electronics and Telecommunications Research Institute, South Korea., and other global vendors for mobile and server platforms. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted.

Search for broadband extended gravitational-wave emission bursts in LIGO S6 in 350-2000 Hz by GPU acceleration

  • van Putten, Maurice H.P.M.
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.42 no.1
    • /
    • pp.37.3-37.3
    • /
    • 2017
  • We present a novel GPU accelerated search algorithm for broadband extended gravitational-wave emission (BEGE) with better than real-time analyis of H1-L1 LIGO S6 data. It performs matched filtering with over 8 million one-second duration chirps. Parseval's Theorem is used to predict the standard deviation ${\sigma}$ of filter output, taking advantage of near-Gaussian LIGO (H1,L1)-data in the high frequency range of 350-2000 Hz. A multiple of ${\sigma}$ serves as a threshold to filter output back to the central processing unit. This algorithm attains 80% efficiency, normalized to the Fast Fourier Transform (FFT). We apply it to a blind, all-sky search for BEGE in LIGO data, such as may be produced by long gamma-ray bursts and superluminous supernovae. We report on mysterious features, that are excluded by exact simultaneous occurrance. Our results are consistent with no events within a radius of about 20 Mpc.

  • PDF

GPU-based Acceleration of Image-based Rendering (GPU를 이용한 영상기반 렌더링의 가속)

  • Lee, Man-Hee;Park, In-Kyu
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07a
    • /
    • pp.685-687
    • /
    • 2005
  • 본 논문에서는 깊이 영상기반 3차원 물체(depth image-based 3-D object)의 고속 렌더링 기법을 제안한다. 제안하는 알고리즘은 그래픽 가속기가 지원하는 shader programming 기법을 이용하여 하드웨어 가속을 직접 이용하도록 설계되었다. 또한, 기존의 영상 기반 렌더링의 한계를 극복하여 조명 효과를 표현할 수 있으며 렌더링시 각 화소당 Splat 크기를 하드웨어에서 직접 조절하여 고속 렌더링이 가능하다. 모의 실험결과, 소프트웨어 렌더링 또는 OpenGL 기반의 렌더링에 비해 괄목할 만한 렌더링 속도의 향상이 이루어졌다.

  • PDF

YOLOv7 Model Inference Time Complexity Analysis in Different Computing Environments (다양한 컴퓨팅 환경에서 YOLOv7 모델의 추론 시간 복잡도 분석)

  • Park, Chun-Su
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.3
    • /
    • pp.7-11
    • /
    • 2022
  • Object detection technology is one of the main research topics in the field of computer vision and has established itself as an essential base technology for implementing various vision systems. Recent DNN (Deep Neural Networks)-based algorithms achieve much higher recognition accuracy than traditional algorithms. However, it is well-known that the DNN model inference operation requires a relatively high computational power. In this paper, we analyze the inference time complexity of the state-of-the-art object detection architecture Yolov7 in various environments. Specifically, we compare and analyze the time complexity of four types of the Yolov7 model, YOLOv7-tiny, YOLOv7, YOLOv7-X, and YOLOv7-E6 when performing inference operations using CPU and GPU. Furthermore, we analyze the time complexity variation when inferring the same models using the Pytorch framework and the Onnxruntime engine.

Acceleration of Radial Gradient Paint Processor for Mobile Device (모바일 기기에서의 방사형 그라디언트 페인트 가속)

  • Kim, Jin-Woo;Park, Jin-Hong;Han, Tack-Don
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.530-533
    • /
    • 2011
  • 방사형 그라디언트 페인트(radial gradient paint)는 벡터 그래픽스(vector graphics)에서 적은 정보로 다양한 효과를 적용시킬 수 있는 방법이다. 기본적으로 이 방법은 곱하기, 나누기, 제곱근 등의 복잡한 연산이 필요하기 때문에 모바일 같은 저성능 환경에 적합하지 않았다. 하지만 최근 모바일 기기들은 SIMD 연산 지원 및 고성능의 GPU 탑재 등으로 성능이 향상됨에 따라 이러한 문제를 해결할 수 있게 되었다. 본 논문은 ARM의 SIMD연산인 NEON을 이용하여 최대 2.6배의 성능을 가속시켰으며 GPU의 쉐이더를 이용하여 4.9배의 성능을 가속하였다.

Acceleration of computation speed for elastic wave simulation using a Graphic Processing Unit (그래픽 프로세서를 이용한 탄성파 수치모사의 계산속도 향상)

  • Nakata, Norimitsu;Tsuji, Takeshi;Matsuoka, Toshifumi
    • Geophysics and Geophysical Exploration
    • /
    • v.14 no.1
    • /
    • pp.98-104
    • /
    • 2011
  • Numerical simulation in exploration geophysics provides important insights into subsurface wave propagation phenomena. Although elastic wave simulations take longer to compute than acoustic simulations, an elastic simulator can construct more realistic wavefields including shear components. Therefore, it is suitable for exploration of the responses of elastic bodies. To overcome the long duration of the calculations, we use a Graphic Processing Unit (GPU) to accelerate the elastic wave simulation. Because a GPU has many processors and a wide memory bandwidth, we can use it in a parallelised computing architecture. The GPU board used in this study is an NVIDIA Tesla C1060, which has 240 processors and a 102 GB/s memory bandwidth. Despite the availability of a parallel computing architecture (CUDA), developed by NVIDIA, we must optimise the usage of the different types of memory on the GPU device, and the sequence of calculations, to obtain a significant speedup of the computation. In this study, we simulate two- (2D) and threedimensional (3D) elastic wave propagation using the Finite-Difference Time-Domain (FDTD) method on GPUs. In the wave propagation simulation, we adopt the staggered-grid method, which is one of the conventional FD schemes, since this method can achieve sufficient accuracy for use in numerical modelling in geophysics. Our simulator optimises the usage of memory on the GPU device to reduce data access times, and uses faster memory as much as possible. This is a key factor in GPU computing. By using one GPU device and optimising its memory usage, we improved the computation time by more than 14 times in the 2D simulation, and over six times in the 3D simulation, compared with one CPU. Furthermore, by using three GPUs, we succeeded in accelerating the 3D simulation 10 times.

Bit Operation Optimization and DNN Application using GPU Acceleration (GPU 가속기를 통한 비트 연산 최적화 및 DNN 응용)

  • Kim, Sang Hyeok;Lee, Jae Heung
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1314-1320
    • /
    • 2019
  • In this paper, we propose a new method for optimizing bit operations and applying them to DNN(Deep Neural Network) in software environment. As a method for this, we propose a packing function for bitwise optimization and a masking matrix multiplication operation for application to DNN. The packing function converts 32-bit real value to 2-bit quantization value through threshold comparison operation. When this sequence is over, four 32-bit real values are changed to one 8-bit value. The masking matrix multiplication operation consists of a special operation for multiplying the packed weight value with the normal input value. And each operation was then processed in parallel using a GPU accelerator. As a result of this experiment, memory saved about 16 times than 32-bit DNN Model. Nevertheless, the accuracy was within 1%, similar to the 32-bit model.

Design of a Dispatch Unit & Operand Selection Unit for Improving the SIMT Based GP-GPU Instruction Performance (SIMT구조 GP-GPU의 명령어 처리 성능 향상을 위한 Dispatch Unit과 Operand Selection Unit설계)

  • Kwak, Jae Chang
    • Journal of IKEEE
    • /
    • v.19 no.3
    • /
    • pp.455-459
    • /
    • 2015
  • This paper proposes a dispatch unit of GP-GPU with SIMT architecture to support the acceleration of general-purpose operation as well as graphics processing. If all the information of an operand used instructions issued from the warp scheduler is decoded, an unnecessary operand load occurs, resulting in register loads. To resolve this problem, this paper proposes a method that can reduce the operand load and the load on the resister by decoding only the information of the operand using a pre-decoding method. The operand information from the dispatch unit is passed to the operand selection unit with preventing register bank collisions. Thus the overall performance are improved. In the simulation test, the total clock cycles required by processing 10,000 arbitrary instructions issued from the wrap scheduler using ModelSim SE 10.0b are measured. It shows that the application of the dispatch unit equipped with the pre-decoding function proposed in this paper can make an improvement of about 12% in processing performance compared to the conventional method.

Acceleration of Terrain Rendering Using Bounding Box Subdivision (바운딩 박스 세분화를 통한 지형 렌더링의 가속화)

  • Lee, Eun-Seok;Lee, Jin-Hee;Jo, In-Woo;Shin, Byeong-Seok
    • Journal of Korea Game Society
    • /
    • v.11 no.6
    • /
    • pp.71-80
    • /
    • 2011
  • Recent terrain rendering applications such as 3D games and virtual reality, use GPU-based ray-casting method for rendering high-quality scenes in realtime. As the size of terrain dataset grows bigger, the rendering speed will be decreased by the increase of the number of texture samplings. To accelerate the conventional ray-casting, we propose an efficient ray casting method with subdivided bounding boxes which are based-on GPU quadtree traversal. The subdivision of the terrain's bounding box can reduce the empty spaces effectively. By performing the ray-casting with this compact bounding box, we can efficiently reduce computation with empty space skipping. Unlike the recent quadtree-based empty space skipping techniques which perform the tree traversal at each ray, our method traverses the tree only once per frame. Therefore, we can save much computational time.