• Title/Summary/Keyword: GPU acceleration

Search Result 79, Processing Time 0.026 seconds

FPGA-Based Acceleration of Range Doppler Algorithm for Real-Time Synthetic Aperture Radar Imaging (실시간 SAR 영상 생성을 위한 Range Doppler 알고리즘의 FPGA 기반 가속화)

  • Jeong, Dongmin;Lee, Wookyung;Jung, Yunho
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.634-643
    • /
    • 2021
  • In this paper, an FPGA-based acceleration scheme of range Doppler algorithm (RDA) is proposed for the real time synthetic aperture radar (SAR) imaging. Hardware architectures of matched filter based on systolic array architecture and a high speed sinc interpolator to compensate range cell migration (RCM) are presented. In addition, the proposed hardware was implemented and accelerated on Xilinx Alveo FPGA. Experimental results for 4096×4096-size SAR imaging showed that FPGA-based implementation achieves 2 times acceleration compared to GPU-based design. It was also confirmed the proposed design can be implemented with 60,247 CLB LUTs, 103,728 CLB registers, 20 block RAM tiles and 592 DPSs at the operating frequency of 312 MHz.

Fast Medical Volume Decompression Using GPGPU (GPGPU를 이용한 고속 의료 볼륨 영상의 압축 복원)

  • Kye, Hee-Won
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.5
    • /
    • pp.624-631
    • /
    • 2012
  • For many medical imaging systems, volume datasets are stored as a compressed form, so that the dataset has to be decompressed before it is visualized. Since the decompression process takes quite a long time, we present an acceleration method for medical volume decompression using GPU. Our method supports that both lossy and lossless compression and progressive refinement is possible to satisfy variable user requirements. Moreover, our decompression method is well parallelized for GPU so that the decompression takes a very short time. Finally, we designed that the decompression and volume rendering work in one framework so that the selective decompression is available. As a result, we gained additional improvement in volume decompression.

The Recent Trends of Rendering Acceleration Technologies (렌더링 가속화 기술 동향)

  • Nam, Seung-U;Kim, Hae-Dong;Kim, Seong-Su;Choe, Jin-Seong
    • Electronics and Telecommunications Trends
    • /
    • v.22 no.4 s.106
    • /
    • pp.12-23
    • /
    • 2007
  • 컴퓨터 그래픽스를 이용한 디지털 콘텐츠를 제작 및 생산함에 있어서 마지막 단계에서 렌더링 과정을 꼭 거쳐야 하기 때문에 렌더링 부분은 아주 중요하다. 렌더링해야 할 디지털 콘텐츠에는 게임과 같이 실시간성이 아주 중요한 콘텐츠가 있으며, 영화와 같이 영상의 높은 품질을 요구하는 콘텐츠가 있다. 본 고에서는 영화와 같이 고품질을 요구하는 콘텐츠에 대한 렌더링 기술에 대하여 다루고자 한다. 영화의 한 장면과 같이 복잡하며 높은 해상도를 갖는 영상을 기존 단일 CPU 및 소프트웨어 렌더러를 이용하여 렌더링하는 데 아주 많은 시간이 걸린다. 본 고에서는 렌더링 시간을 줄이며 높은 품질의 렌더링 결과를 얻는 기술을 3가지 부분에서 소개하고자 한다. 첫번째 방법에는 수십 개에서 수천 개의 CPU를 이용하거나 PC를 클러스터링하는 방법이고, 두번째는 기존 GPU의 기술이 아주 빨리 발전하여 CPU 보다 빠른 성능을 갖기 때문에 GPU를 활용하여 가속화하는 방법이 있으며, 세번째는 전용 하드웨어를 제작하여 렌더링을 가속하는 방법이 있다. 위의 방법들에 대한 기술 동향에 대하여 살펴보도록 한다.

Trends in AI Processor Technology (인공지능프로세서 기술 동향)

  • Lee, M.Y.;Chung, J.;Lee, J.H.;Han, J.H.;Kwon, Y.S.
    • Electronics and Telecommunications Trends
    • /
    • v.35 no.3
    • /
    • pp.66-75
    • /
    • 2020
  • As the increasing expectations of a practical AI (Artificial Intelligence) service makes AI algorithms more complicated, an efficient processor to process AI algorithms is required. To meet this requirement, processors optimized for parallel processing, such as GPUs (Graphics Processing Units), have been widely employed. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted. This paper briefly introduces an AI processor especially for inference acceleration, developed by the Electronics and Telecommunications Research Institute, South Korea., and other global vendors for mobile and server platforms. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted.

Search for broadband extended gravitational-wave emission bursts in LIGO S6 in 350-2000 Hz by GPU acceleration

  • van Putten, Maurice H.P.M.
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.42 no.1
    • /
    • pp.37.3-37.3
    • /
    • 2017
  • We present a novel GPU accelerated search algorithm for broadband extended gravitational-wave emission (BEGE) with better than real-time analyis of H1-L1 LIGO S6 data. It performs matched filtering with over 8 million one-second duration chirps. Parseval's Theorem is used to predict the standard deviation ${\sigma}$ of filter output, taking advantage of near-Gaussian LIGO (H1,L1)-data in the high frequency range of 350-2000 Hz. A multiple of ${\sigma}$ serves as a threshold to filter output back to the central processing unit. This algorithm attains 80% efficiency, normalized to the Fast Fourier Transform (FFT). We apply it to a blind, all-sky search for BEGE in LIGO data, such as may be produced by long gamma-ray bursts and superluminous supernovae. We report on mysterious features, that are excluded by exact simultaneous occurrance. Our results are consistent with no events within a radius of about 20 Mpc.

  • PDF

GPU-based Acceleration of Image-based Rendering (GPU를 이용한 영상기반 렌더링의 가속)

  • Lee, Man-Hee;Park, In-Kyu
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07a
    • /
    • pp.685-687
    • /
    • 2005
  • 본 논문에서는 깊이 영상기반 3차원 물체(depth image-based 3-D object)의 고속 렌더링 기법을 제안한다. 제안하는 알고리즘은 그래픽 가속기가 지원하는 shader programming 기법을 이용하여 하드웨어 가속을 직접 이용하도록 설계되었다. 또한, 기존의 영상 기반 렌더링의 한계를 극복하여 조명 효과를 표현할 수 있으며 렌더링시 각 화소당 Splat 크기를 하드웨어에서 직접 조절하여 고속 렌더링이 가능하다. 모의 실험결과, 소프트웨어 렌더링 또는 OpenGL 기반의 렌더링에 비해 괄목할 만한 렌더링 속도의 향상이 이루어졌다.

  • PDF

YOLOv7 Model Inference Time Complexity Analysis in Different Computing Environments (다양한 컴퓨팅 환경에서 YOLOv7 모델의 추론 시간 복잡도 분석)

  • Park, Chun-Su
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.3
    • /
    • pp.7-11
    • /
    • 2022
  • Object detection technology is one of the main research topics in the field of computer vision and has established itself as an essential base technology for implementing various vision systems. Recent DNN (Deep Neural Networks)-based algorithms achieve much higher recognition accuracy than traditional algorithms. However, it is well-known that the DNN model inference operation requires a relatively high computational power. In this paper, we analyze the inference time complexity of the state-of-the-art object detection architecture Yolov7 in various environments. Specifically, we compare and analyze the time complexity of four types of the Yolov7 model, YOLOv7-tiny, YOLOv7, YOLOv7-X, and YOLOv7-E6 when performing inference operations using CPU and GPU. Furthermore, we analyze the time complexity variation when inferring the same models using the Pytorch framework and the Onnxruntime engine.

Acceleration of Radial Gradient Paint Processor for Mobile Device (모바일 기기에서의 방사형 그라디언트 페인트 가속)

  • Kim, Jin-Woo;Park, Jin-Hong;Han, Tack-Don
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.530-533
    • /
    • 2011
  • 방사형 그라디언트 페인트(radial gradient paint)는 벡터 그래픽스(vector graphics)에서 적은 정보로 다양한 효과를 적용시킬 수 있는 방법이다. 기본적으로 이 방법은 곱하기, 나누기, 제곱근 등의 복잡한 연산이 필요하기 때문에 모바일 같은 저성능 환경에 적합하지 않았다. 하지만 최근 모바일 기기들은 SIMD 연산 지원 및 고성능의 GPU 탑재 등으로 성능이 향상됨에 따라 이러한 문제를 해결할 수 있게 되었다. 본 논문은 ARM의 SIMD연산인 NEON을 이용하여 최대 2.6배의 성능을 가속시켰으며 GPU의 쉐이더를 이용하여 4.9배의 성능을 가속하였다.

Acceleration of computation speed for elastic wave simulation using a Graphic Processing Unit (그래픽 프로세서를 이용한 탄성파 수치모사의 계산속도 향상)

  • Nakata, Norimitsu;Tsuji, Takeshi;Matsuoka, Toshifumi
    • Geophysics and Geophysical Exploration
    • /
    • v.14 no.1
    • /
    • pp.98-104
    • /
    • 2011
  • Numerical simulation in exploration geophysics provides important insights into subsurface wave propagation phenomena. Although elastic wave simulations take longer to compute than acoustic simulations, an elastic simulator can construct more realistic wavefields including shear components. Therefore, it is suitable for exploration of the responses of elastic bodies. To overcome the long duration of the calculations, we use a Graphic Processing Unit (GPU) to accelerate the elastic wave simulation. Because a GPU has many processors and a wide memory bandwidth, we can use it in a parallelised computing architecture. The GPU board used in this study is an NVIDIA Tesla C1060, which has 240 processors and a 102 GB/s memory bandwidth. Despite the availability of a parallel computing architecture (CUDA), developed by NVIDIA, we must optimise the usage of the different types of memory on the GPU device, and the sequence of calculations, to obtain a significant speedup of the computation. In this study, we simulate two- (2D) and threedimensional (3D) elastic wave propagation using the Finite-Difference Time-Domain (FDTD) method on GPUs. In the wave propagation simulation, we adopt the staggered-grid method, which is one of the conventional FD schemes, since this method can achieve sufficient accuracy for use in numerical modelling in geophysics. Our simulator optimises the usage of memory on the GPU device to reduce data access times, and uses faster memory as much as possible. This is a key factor in GPU computing. By using one GPU device and optimising its memory usage, we improved the computation time by more than 14 times in the 2D simulation, and over six times in the 3D simulation, compared with one CPU. Furthermore, by using three GPUs, we succeeded in accelerating the 3D simulation 10 times.

Bit Operation Optimization and DNN Application using GPU Acceleration (GPU 가속기를 통한 비트 연산 최적화 및 DNN 응용)

  • Kim, Sang Hyeok;Lee, Jae Heung
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1314-1320
    • /
    • 2019
  • In this paper, we propose a new method for optimizing bit operations and applying them to DNN(Deep Neural Network) in software environment. As a method for this, we propose a packing function for bitwise optimization and a masking matrix multiplication operation for application to DNN. The packing function converts 32-bit real value to 2-bit quantization value through threshold comparison operation. When this sequence is over, four 32-bit real values are changed to one 8-bit value. The masking matrix multiplication operation consists of a special operation for multiplying the packed weight value with the normal input value. And each operation was then processed in parallel using a GPU accelerator. As a result of this experiment, memory saved about 16 times than 32-bit DNN Model. Nevertheless, the accuracy was within 1%, similar to the 32-bit model.