• Title/Summary/Keyword: GPU Acceleration

Search Result 76, Processing Time 0.028 seconds

GPU Acceleration of Range Doppler Algorithm for Real-Time SAR Image Generation (실시간 SAR 영상 생성을 위한 Range Doppler Algorithm의 GPU 가속)

  • Dong-Min Jeong;Woo-Kyung Lee;Myeong-Jin Lee;Yun-Ho Jung
    • Journal of IKEEE
    • /
    • v.27 no.3
    • /
    • pp.265-272
    • /
    • 2023
  • In this paper, a GPU-accelerated kernel of range Doppler algorithm (RDA) was developed for real-time image formation based on frequency modulated continuous wave (FMCW) synthetic aperture radar (SAR). A pinned memory was used to minimize the data transfer time between the host and the GPU device, and the kernel was configured to perform all RDA operations on the GPU to minimize the number of data transfers. The dataset was obtained through the FMCW drone SAR experiment, and the GPU acceleration effect was measured in an intel i7-9700K CPU, 32GB RAM, and Nvidia RTX 3090 GPU environment. Including the data transfer time between host and devices, it was measured to be accelerated up to 3.41 times compared to the CPU, and when only the acceleration effect of operation was measured without including the data transfer time, it was confirmed that it could be accelerated up to 156 times.

Implementation of GPU Acceleration of Object Detection Application with Drone Video (드론 영상 대상 물체 검출 어플리케이션의 GPU가속 구현)

  • Park, Si-Hyun;Park, Chun-Su
    • Journal of the Semiconductor & Display Technology
    • /
    • v.20 no.3
    • /
    • pp.117-119
    • /
    • 2021
  • With the development of the industry, the use of drones in specific mission flight is being actively studied. These drones fly a specified path and perform repetitive tasks. if the drone system will detect objects in real time, the performance of these mission flight will increase. In this paper, we implement object detection system and mount GPU acceleration to maximize the efficiency of limited device resources with drone video using Tensorflow Lite which enables in-device inference from a mobile device and Mobile SDK of DJI, a drone manufacture. For performance comparison, the average processing time per frame was measured when object detection was performed using only the CPU and when object detection was performed using the CPU and GPU at the same time.

Acceleration of ECC Computation for Robust Massive Data Reception under GPU-based Embedded Systems (GPU 기반 임베디드 시스템에서 대용량 데이터의 안정적 수신을 위한 ECC 연산의 가속화)

  • Kwon, Jisu;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.956-962
    • /
    • 2020
  • Recently, as the size of data used in an embedded system increases, the need for an ECC decoding operation to robustly receive a massive data is emphasized. In this paper, we propose a method to accelerate the execution of computations that derive syndrome vectors when ECC decoding is performed using Hamming code in an embedded system with a built-in GPU. The proposed acceleration method uses the matrix-vector multiplication of the decoding operation using the CSR format, one of the data structures representing sparse matrix, and is performed in parallel in the CUDA kernel of the GPU. We evaluated the proposed method using a target embedded board with a GPU, and the result shows that the execution time is reduced when ECC decoding operation accelerated based on the GPU than used only CPU.

Development and run time assessment of the GPU accelerated technique of a 2-Dimensional model for high resolution flood simulation in wide area (광역 고해상도 홍수모의를 위한 2차원 모형의 GPU 가속기법 개발 및 실행시간 평가)

  • Choi, Yun Seok;Noh, Hui Seong;Choi, Cheon Kyu
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.12
    • /
    • pp.991-998
    • /
    • 2022
  • The purpose of this study is to develop GPU (Graphics Processing Unit) acceleration technique for 2-dimensional model and to assess the effectiveness for high resolution flood simulation in wide area In this study, GPU acceleration technique was implemented in the G2D (Grid based 2-Dimensional land surface flood model) model, using implicit scheme and uniform square grid, by using CUDA. The technique was applied to flood simulation in Jinju-si. The spatial resolution of the simulation domain is 10 m × 10 m, and the number of cells to calculate is 5,090,611. Flood period by typhoon Mitag, December 2019, was simulated. Rainfall radar data was applied to source term and measured discharge of Namgang-Dam (Ilryu-moon) and measured stream flow of Jinju-si (Oksan-gyo) were applied to boundary conditions. From this study, 2-dimensional flood model could be implemented to reproduce the measured water level in Nam-gang (Riv.). The results of GPU acceleration technique showed more faster flood simulation than the serial and parallel simulation using CPU (Central Processing Unit). This study can contribute to the study of developing GPU acceleration technique for 2-dimensional flood model using implicit scheme and simulating land surface flood in wide area.

GPU-based Acceleration of Particle Filter Signal Processing for Efficient Moving-target Position Estimation (이동 목표물의 효율적인 위치 추정을 위한 파티클 필터 신호 처리의 GPU 기반 가속화)

  • Kim, Seongseop;Cho, Jeonghun;Park, Daejin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.12 no.5
    • /
    • pp.267-275
    • /
    • 2017
  • Time of difference of arrival (TDOA) method using passive sonar sensor array has normally been used to estimate the location of a concealed moving target in underwater environment. Particle filter has been introduced for effective target estimation for non-Gaussian and nonlinear systems. In this paper, we propose a GPU-based acceleration of target position estimation using particle filter and propose efficient embedded system and software architecture. For the TDOA measurement from the passive sonar sensor, we use the generalized cross correlation phase transform (GCC-PHAT) method to obtain the correlation coefficient of the signal using FFT and we try to accelerate the calculation of GCC-PHAT based TDOA measurements using FFT with GPU CUDA. We also propose parallelization method of the target position estimation algorithm using the GPU CUDA to update the state of each particle for the target position estimation using the measured values. The target estimation algorithm was verified using Matlab and implemented using GPU CUDA. Then, we realized the proposed signal processing acceleration system using NVIDIA Jetson TX1 as the target board to analyze in terms of the execution time. The execution time of the algorithm is reduced by 55% to the CPU standalone-operation on the target board. Experiment results show that the proposed architecture is a feasible solution in terms of high-performance and area-efficient architecture.

Adaptive Processing Algorithm Allocation on OpenCL-based FPGA-GPU Hybrid Layer for Energy-Efficient Reconfigurable Acceleration of Abnormal ECG Diagnosis (비정상 ECG 진단의 에너지 효율적인 재구성 가능한 가속을 위한 OpenCL 기반 FPGA-GPU 혼합 계층 적응 처리 알고리즘 할당)

  • Lee, Dongkyu;Lee, Seungmin;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1279-1286
    • /
    • 2021
  • The electrocardiogram (ECG) signal is a good indicator for early diagnosis of heart abnormalities. The ECG signal has a different reference normal signal for each person. And it requires lots of data to diagnosis. In this paper, we propose an adaptive OpenCL-based FPGA-GPU hybrid-layer platform to efficiently accelerate ECG signal diagnosis. As a result of diagnosing 19870 number of ECG signals of MIT-BIH arrhythmia database on the platform, the FPGA accelerator takes 1.15s, that the execution time was reduced by 89.94% and the power consumption was reduced by 84.0% compared to the software execution. The GPU accelerator takes 1.87s, that the execution time was reduced by 83.56% and the power consumption was reduced by 62.3% compared to the software execution. Although the proposed FPGA-GPU hybrid platform has a slower diagnostic speed than the FPGA accelerator, it can operate a flexible algorithm according to the situation by using the GPU.

Matrix Multiplication Acceleration with GPU and Locality (GPU와 지역성을 이용한 행렬 곱셈 가속)

  • Kwon, Oh-Young;Lee, Chang-Mug
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.902-903
    • /
    • 2009
  • Matrix multiplication is widely used in scientific and engineering field. Locality can improve the execution performance of matrix multiplication. A method for accelerating matrix multiplication is presented. This method uses both CPU and GPU computing power in PC. The presented method improved execution time about %15~30% than the method which uses only GPU.

  • PDF

GPU-Based ECC Decode Unit for Efficient Massive Data Reception Acceleration

  • Kwon, Jisu;Seok, Moon Gi;Park, Daejin
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1359-1371
    • /
    • 2020
  • In transmitting and receiving such a large amount of data, reliable data communication is crucial for normal operation of a device and to prevent abnormal operations caused by errors. Therefore, in this paper, it is assumed that an error correction code (ECC) that can detect and correct errors by itself is used in an environment where massive data is sequentially received. Because an embedded system has limited resources, such as a low-performance processor or a small memory, it requires efficient operation of applications. In this paper, we propose using an accelerated ECC-decoding technique with a graphics processing unit (GPU) built into the embedded system when receiving a large amount of data. In the matrix-vector multiplication that forms the Hamming code used as a function of the ECC operation, the matrix is expressed in compressed sparse row (CSR) format, and a sparse matrix-vector product is used. The multiplication operation is performed in the kernel of the GPU, and we also accelerate the Hamming code computation so that the ECC operation can be performed in parallel. The proposed technique is implemented with CUDA on a GPU-embedded target board, NVIDIA Jetson TX2, and compared with execution time of the CPU.

Implementation of Real-time Interactive Ray Tracing on GPU (GPU 기반의 실시간 인터렉티브 광선추적법 구현)

  • Bae, Sung-Min;Hong, Hyun-Ki
    • Journal of Korea Game Society
    • /
    • v.7 no.3
    • /
    • pp.59-66
    • /
    • 2007
  • Ray tracing is one of the classical global illumination methods to generate a photo-realistic rendering image with various lighting effects such as reflection and refraction. However, there are some restrictions on real-time applications because of its computation load. In order to overcome these limitations, many researches of the ray tracing based on GPU (Graphics Processing Unit) have been presented up to now. In this paper, we implement the ray tracing algorithm by J. Purcell and combine it with two methods in order to improve the rendering performance for interactive applications. First, intersection points of the primary ray are determined efficiently using rasterization on graphics hardware. We then construct the acceleration structure of 3D objects to improve the rendering performance. There are few researches on a detail analysis of improved performance by these considerations in ray tracing rendering. We compare the rendering system with environment mapping based on GPU and implement the wireless remote rendering system. This system is useful for interactive applications such as the realtime composition, augmented reality and virtual reality.

  • PDF

Acceleration Hardware Technology of 3D Graphics (3D 그래픽스 가속 하드웨어 기술)

  • Cho, S.H.;Park, S.M.;Eum, N.W.
    • Electronics and Telecommunications Trends
    • /
    • v.22 no.5
    • /
    • pp.69-77
    • /
    • 2007
  • 3D 그래픽스 관련 산업의 눈부신 성장은 GPU 기술의 발전을 기반으로 이루어졌다. GPU는 기존의 고정된 기능의 파이프라인을 벗어나 프로그램 가능한 형태로 발전하였으며 GPU의 프로그램 능력과 성능의 꾸준한 향상이 이루어지고 있다. 최근에는 GPU 내부의 연산 집중도의 불균형을 해결하기 위한 연구와 GPU의 연산능력을 다른 응용분야에 이용하기 위한 연구가 진행중에 있다. GPU를 이용한 3D 그래픽스 응용프로그램 개발을 위해서 산업 표준의 API들이 존재하는데 데스크톱용 API에서 필수 기능만을 골라 간략화한 모바일 기기용 프로파일 또한 정의되고 있다. 모바일 기기에 사용되는 GPU도 프로그램 가능한 구조로 진화하고 있으며 대중화되기 위해서는 전력소모를 낮추기 위한 노력이 필요하다.