• Title/Summary/Keyword: GPU-Based Acceleration

Search Result 56, Processing Time 0.021 seconds

GPU Acceleration of Range Doppler Algorithm for Real-Time SAR Image Generation (실시간 SAR 영상 생성을 위한 Range Doppler Algorithm의 GPU 가속)

  • Dong-Min Jeong;Woo-Kyung Lee;Myeong-Jin Lee;Yun-Ho Jung
    • Journal of IKEEE
    • /
    • v.27 no.3
    • /
    • pp.265-272
    • /
    • 2023
  • In this paper, a GPU-accelerated kernel of range Doppler algorithm (RDA) was developed for real-time image formation based on frequency modulated continuous wave (FMCW) synthetic aperture radar (SAR). A pinned memory was used to minimize the data transfer time between the host and the GPU device, and the kernel was configured to perform all RDA operations on the GPU to minimize the number of data transfers. The dataset was obtained through the FMCW drone SAR experiment, and the GPU acceleration effect was measured in an intel i7-9700K CPU, 32GB RAM, and Nvidia RTX 3090 GPU environment. Including the data transfer time between host and devices, it was measured to be accelerated up to 3.41 times compared to the CPU, and when only the acceleration effect of operation was measured without including the data transfer time, it was confirmed that it could be accelerated up to 156 times.

Acceleration of ECC Computation for Robust Massive Data Reception under GPU-based Embedded Systems (GPU 기반 임베디드 시스템에서 대용량 데이터의 안정적 수신을 위한 ECC 연산의 가속화)

  • Kwon, Jisu;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.956-962
    • /
    • 2020
  • Recently, as the size of data used in an embedded system increases, the need for an ECC decoding operation to robustly receive a massive data is emphasized. In this paper, we propose a method to accelerate the execution of computations that derive syndrome vectors when ECC decoding is performed using Hamming code in an embedded system with a built-in GPU. The proposed acceleration method uses the matrix-vector multiplication of the decoding operation using the CSR format, one of the data structures representing sparse matrix, and is performed in parallel in the CUDA kernel of the GPU. We evaluated the proposed method using a target embedded board with a GPU, and the result shows that the execution time is reduced when ECC decoding operation accelerated based on the GPU than used only CPU.

GPU-based Acceleration of Particle Filter Signal Processing for Efficient Moving-target Position Estimation (이동 목표물의 효율적인 위치 추정을 위한 파티클 필터 신호 처리의 GPU 기반 가속화)

  • Kim, Seongseop;Cho, Jeonghun;Park, Daejin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.12 no.5
    • /
    • pp.267-275
    • /
    • 2017
  • Time of difference of arrival (TDOA) method using passive sonar sensor array has normally been used to estimate the location of a concealed moving target in underwater environment. Particle filter has been introduced for effective target estimation for non-Gaussian and nonlinear systems. In this paper, we propose a GPU-based acceleration of target position estimation using particle filter and propose efficient embedded system and software architecture. For the TDOA measurement from the passive sonar sensor, we use the generalized cross correlation phase transform (GCC-PHAT) method to obtain the correlation coefficient of the signal using FFT and we try to accelerate the calculation of GCC-PHAT based TDOA measurements using FFT with GPU CUDA. We also propose parallelization method of the target position estimation algorithm using the GPU CUDA to update the state of each particle for the target position estimation using the measured values. The target estimation algorithm was verified using Matlab and implemented using GPU CUDA. Then, we realized the proposed signal processing acceleration system using NVIDIA Jetson TX1 as the target board to analyze in terms of the execution time. The execution time of the algorithm is reduced by 55% to the CPU standalone-operation on the target board. Experiment results show that the proposed architecture is a feasible solution in terms of high-performance and area-efficient architecture.

Development and run time assessment of the GPU accelerated technique of a 2-Dimensional model for high resolution flood simulation in wide area (광역 고해상도 홍수모의를 위한 2차원 모형의 GPU 가속기법 개발 및 실행시간 평가)

  • Choi, Yun Seok;Noh, Hui Seong;Choi, Cheon Kyu
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.12
    • /
    • pp.991-998
    • /
    • 2022
  • The purpose of this study is to develop GPU (Graphics Processing Unit) acceleration technique for 2-dimensional model and to assess the effectiveness for high resolution flood simulation in wide area In this study, GPU acceleration technique was implemented in the G2D (Grid based 2-Dimensional land surface flood model) model, using implicit scheme and uniform square grid, by using CUDA. The technique was applied to flood simulation in Jinju-si. The spatial resolution of the simulation domain is 10 m × 10 m, and the number of cells to calculate is 5,090,611. Flood period by typhoon Mitag, December 2019, was simulated. Rainfall radar data was applied to source term and measured discharge of Namgang-Dam (Ilryu-moon) and measured stream flow of Jinju-si (Oksan-gyo) were applied to boundary conditions. From this study, 2-dimensional flood model could be implemented to reproduce the measured water level in Nam-gang (Riv.). The results of GPU acceleration technique showed more faster flood simulation than the serial and parallel simulation using CPU (Central Processing Unit). This study can contribute to the study of developing GPU acceleration technique for 2-dimensional flood model using implicit scheme and simulating land surface flood in wide area.

Image Feature-Based Real-Time RGB-D 3D SLAM with GPU Acceleration (GPU 가속화를 통한 이미지 특징점 기반 RGB-D 3차원 SLAM)

  • Lee, Donghwa;Kim, Hyongjin;Myung, Hyun
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.19 no.5
    • /
    • pp.457-461
    • /
    • 2013
  • This paper proposes an image feature-based real-time RGB-D (Red-Green-Blue Depth) 3D SLAM (Simultaneous Localization and Mapping) system. RGB-D data from Kinect style sensors contain a 2D image and per-pixel depth information. 6-DOF (Degree-of-Freedom) visual odometry is obtained through the 3D-RANSAC (RANdom SAmple Consensus) algorithm with 2D image features and depth data. For speed up extraction of features, parallel computation is performed with GPU acceleration. After a feature manager detects a loop closure, a graph-based SLAM algorithm optimizes trajectory of the sensor and builds a 3D point cloud based map.

Parallel Implementation of Scrypt: A Study on GPU Acceleration for Password-Based Key Derivation Function

  • SeongJun Choi;DongCheon Kim;Seog Chung Seo
    • Journal of information and communication convergence engineering
    • /
    • v.22 no.2
    • /
    • pp.98-108
    • /
    • 2024
  • Scrypt is a password-based key derivation function proposed by Colin Percival in 2009 that has a memory-hard structure. Scrypt has been intentionally designed with a memory-intensive structure to make password cracking using ASICs, GPUs, and similar hardware more difficult. However, in this study, we thoroughly analyzed the operation of Scrypt and proposed strategies to maximize computational parallelism in GPU environments. Through these optimizations, we achieved an outstanding performance improvement of 8284.4% compared with traditional CPU-based Scrypt computations. Moreover, the GPU-optimized implementation presented in this paper outperforms the simple GPU-based Scrypt processing by a significant margin, providing a performance improvement of 204.84% in the RTX3090. These results demonstrate the effectiveness of our proposed approach in harnessing the computational power of GPUs and achieving remarkable performance gains in Scrypt calculations. Our proposed implementation is the first GPU implementation of Scrypt, demonstrating the ability to efficiently crack Scrypt.

Adaptive Processing Algorithm Allocation on OpenCL-based FPGA-GPU Hybrid Layer for Energy-Efficient Reconfigurable Acceleration of Abnormal ECG Diagnosis (비정상 ECG 진단의 에너지 효율적인 재구성 가능한 가속을 위한 OpenCL 기반 FPGA-GPU 혼합 계층 적응 처리 알고리즘 할당)

  • Lee, Dongkyu;Lee, Seungmin;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1279-1286
    • /
    • 2021
  • The electrocardiogram (ECG) signal is a good indicator for early diagnosis of heart abnormalities. The ECG signal has a different reference normal signal for each person. And it requires lots of data to diagnosis. In this paper, we propose an adaptive OpenCL-based FPGA-GPU hybrid-layer platform to efficiently accelerate ECG signal diagnosis. As a result of diagnosing 19870 number of ECG signals of MIT-BIH arrhythmia database on the platform, the FPGA accelerator takes 1.15s, that the execution time was reduced by 89.94% and the power consumption was reduced by 84.0% compared to the software execution. The GPU accelerator takes 1.87s, that the execution time was reduced by 83.56% and the power consumption was reduced by 62.3% compared to the software execution. Although the proposed FPGA-GPU hybrid platform has a slower diagnostic speed than the FPGA accelerator, it can operate a flexible algorithm according to the situation by using the GPU.

GPU-Based ECC Decode Unit for Efficient Massive Data Reception Acceleration

  • Kwon, Jisu;Seok, Moon Gi;Park, Daejin
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1359-1371
    • /
    • 2020
  • In transmitting and receiving such a large amount of data, reliable data communication is crucial for normal operation of a device and to prevent abnormal operations caused by errors. Therefore, in this paper, it is assumed that an error correction code (ECC) that can detect and correct errors by itself is used in an environment where massive data is sequentially received. Because an embedded system has limited resources, such as a low-performance processor or a small memory, it requires efficient operation of applications. In this paper, we propose using an accelerated ECC-decoding technique with a graphics processing unit (GPU) built into the embedded system when receiving a large amount of data. In the matrix-vector multiplication that forms the Hamming code used as a function of the ECC operation, the matrix is expressed in compressed sparse row (CSR) format, and a sparse matrix-vector product is used. The multiplication operation is performed in the kernel of the GPU, and we also accelerate the Hamming code computation so that the ECC operation can be performed in parallel. The proposed technique is implemented with CUDA on a GPU-embedded target board, NVIDIA Jetson TX2, and compared with execution time of the CPU.

Implementation of Real-time Interactive Ray Tracing on GPU (GPU 기반의 실시간 인터렉티브 광선추적법 구현)

  • Bae, Sung-Min;Hong, Hyun-Ki
    • Journal of Korea Game Society
    • /
    • v.7 no.3
    • /
    • pp.59-66
    • /
    • 2007
  • Ray tracing is one of the classical global illumination methods to generate a photo-realistic rendering image with various lighting effects such as reflection and refraction. However, there are some restrictions on real-time applications because of its computation load. In order to overcome these limitations, many researches of the ray tracing based on GPU (Graphics Processing Unit) have been presented up to now. In this paper, we implement the ray tracing algorithm by J. Purcell and combine it with two methods in order to improve the rendering performance for interactive applications. First, intersection points of the primary ray are determined efficiently using rasterization on graphics hardware. We then construct the acceleration structure of 3D objects to improve the rendering performance. There are few researches on a detail analysis of improved performance by these considerations in ray tracing rendering. We compare the rendering system with environment mapping based on GPU and implement the wireless remote rendering system. This system is useful for interactive applications such as the realtime composition, augmented reality and virtual reality.

  • PDF

Performance Analysis of DNN inference using OpenCV Built in CPU and GPU Functions (OpenCV 내장 CPU 및 GPU 함수를 이용한 DNN 추론 시간 복잡도 분석)

  • Park, Chun-Su
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.1
    • /
    • pp.75-78
    • /
    • 2022
  • Deep Neural Networks (DNN) has become an essential data processing architecture for the implementation of multiple computer vision tasks. Recently, DNN-based algorithms achieve much higher recognition accuracy than traditional algorithms based on shallow learning. However, training and inference DNNs require huge computational capabilities than daily usage purposes of computers. Moreover, with increased size and depth of DNNs, CPUs may be unsatisfactory since they use serial processing by default. GPUs are the solution that come up with greater speed compared to CPUs because of their Parallel Processing/Computation nature. In this paper, we analyze the inference time complexity of DNNs using well-known computer vision library, OpenCV. We measure and analyze inference time complexity for three cases, CPU, GPU-Float32, and GPU-Float16.