• Title/Summary/Keyword: Parallel GPU

Search Result 284, Processing Time 0.022 seconds

Implememtation of Fast Rasterizer processing using GPGPU based on SIMT structure (SIMT 구조 기반 GPGPU를 이용한 고속 Rasterizer 구현)

  • Kim, Chiyong
    • Journal of IKEEE
    • /
    • v.21 no.3
    • /
    • pp.276-279
    • /
    • 2017
  • In this paper, SIMT structure based GPGPU (General Purpose Computing on Graphics Processing Units) is used for accelerating the Rasterizer which constitutes the screen of the display device in pixel unit. The GPU has a large number of ALUs, and the processing is very fast because of parallel processing. Therefore, in this paper, we implemented a rasterizer that generates a 3D graphics model using a CPU that performs operations sequentially and a GPU that performs operations in parallel. We confirmed that proposed rasterizer in this paper is 1.45 times better than rasterizer using Intel CPU when generating one frame.

Development and run time assessment of the GPU accelerated technique of a 2-Dimensional model for high resolution flood simulation in wide area (광역 고해상도 홍수모의를 위한 2차원 모형의 GPU 가속기법 개발 및 실행시간 평가)

  • Choi, Yun Seok;Noh, Hui Seong;Choi, Cheon Kyu
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.12
    • /
    • pp.991-998
    • /
    • 2022
  • The purpose of this study is to develop GPU (Graphics Processing Unit) acceleration technique for 2-dimensional model and to assess the effectiveness for high resolution flood simulation in wide area In this study, GPU acceleration technique was implemented in the G2D (Grid based 2-Dimensional land surface flood model) model, using implicit scheme and uniform square grid, by using CUDA. The technique was applied to flood simulation in Jinju-si. The spatial resolution of the simulation domain is 10 m × 10 m, and the number of cells to calculate is 5,090,611. Flood period by typhoon Mitag, December 2019, was simulated. Rainfall radar data was applied to source term and measured discharge of Namgang-Dam (Ilryu-moon) and measured stream flow of Jinju-si (Oksan-gyo) were applied to boundary conditions. From this study, 2-dimensional flood model could be implemented to reproduce the measured water level in Nam-gang (Riv.). The results of GPU acceleration technique showed more faster flood simulation than the serial and parallel simulation using CPU (Central Processing Unit). This study can contribute to the study of developing GPU acceleration technique for 2-dimensional flood model using implicit scheme and simulating land surface flood in wide area.

Parallel Implementation of the Recursive Least Square for Hyperspectral Image Compression on GPUs

  • Li, Changguo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.7
    • /
    • pp.3543-3557
    • /
    • 2017
  • Compression is a very important technique for remotely sensed hyperspectral images. The lossless compression based on the recursive least square (RLS), which eliminates hyperspectral images' redundancy using both spatial and spectral correlations, is an extremely powerful tool for this purpose, but the relatively high computational complexity limits its application to time-critical scenarios. In order to improve the computational efficiency of the algorithm, we optimize its serial version and develop a new parallel implementation on graphics processing units (GPUs). Namely, an optimized recursive least square based on optimal number of prediction bands is introduced firstly. Then we use this approach as a case study to illustrate the advantages and potential challenges of applying GPU parallel optimization principles to the considered problem. The proposed parallel method properly exploits the low-level architecture of GPUs and has been carried out using the compute unified device architecture (CUDA). The GPU parallel implementation is compared with the serial implementation on CPU. Experimental results indicate remarkable acceleration factors and real-time performance, while retaining exactly the same bit rate with regard to the serial version of the compressor.

Hybrid parallel programming for Heterogeneous Multi-core performance optimization (헤테로지니어스 멀티코어 성능 최적화를 위한 하이브리드 병렬 프로그래밍)

  • Lim, Ju-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.7-9
    • /
    • 2012
  • CPU는 싱글 코어 구조에서 클록 속도를 높여 성능을 향상 시키려는 노력을 해왔으나 한계에 도달하자 하나의 칩에 코어를 여러 개 둔 멀티코어 형태로 발전하였다. CPU의 성능 향상을 위해 이제는 3D그래픽을 연산처리하기 위해 만들어진 GPU와 결합하기에 이르렀다. CPU와 GPU의 결합은 CPU간의 결합보다 훨씬 더 좋은 성능을 보였고 전력의 사용량도 더 적었으며 비용면에서도 경제적이라는 장점을 가지고 있다. 본 논문에서는 CPU와 GPU의 Heterogeneous multicore상에서 성능을 최적화하기 위해 기존의 병렬화 모델을 조합하고 최적화를 시도하였다. CPU상에서는 성능 향상을 위해 기존의 병렬 프로그램 모델인 SIMD와 공유메모리 병렬 프로그래밍 모델 그리고 메시지 패싱 병렬 프로그래밍 모델을 조합하는 실험을 했다. GPU에서는 CUDA를 최적화 하였다. 이렇게 CPU와 GPU를 최적화하고 조합하여 고성능 연산을 요구하는 어플리케이션을 위한 Heterogeneous multicore 성능 최적화 방법을 제안한다.

Faster Fingerprint Matching Algorithm Using GPU (GPU를 이용한 보다 빠른 지문 인식 알고리즘)

  • Riaz, Sidra;Lee, Sang-Woong
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2012.05a
    • /
    • pp.43-45
    • /
    • 2012
  • This paper is based on embedding the biometrics techniques on GPU for better computational efficiency and fast matching process using the parallel nature of the GPU processors to compare thousands of images for fingerprint recognition in a fraction of a second. In this paper we worked on GPU (INVIDIA GeForce GTX 260 with compute capability 1.3 and dual core-2-dou processor) for fingerprint matching and found that the efficiency is better than the results with related work already done on CMOS, CPU, ARM9, MATLAB Neural Networks etc which shows the better performance of our system in terms of computational time. The features matching process proposed for fingerprint recognition and the verification procedure is done on 5,000 images which are available online in the databases FVC2000, 2002, 2004 [1].

  • PDF

Spectral Modeling Synthesis of Haegeum using GPU (GPU를 이용한 해금의 스펙트럼 모델링)

  • Islam, Md Shohidul;Islam, Md Rashedul;Farid, Fahmid Al;Kim, Jong-Myon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.01a
    • /
    • pp.5-8
    • /
    • 2014
  • This paper presents a parallel approach of formant synthesis method for haegeum on graphics processing units (GPU) using spectral modeling. Spectral modeling synthesis (SMS) is a technique that models time-varying spectra as a combination of sinusoids and a time-varying filtered noise component. A second-order digital resonator by the impulse-invariant transform (IIT) is applied to generate deterministic components and the results are band-pass filtered to adjust magnitude. The noise is calculated by first generating the sinusoids with formant synthesis, subtracting them from the original sound, and then removing some harmonics remained. The synthesized sounds are consequently by adding sinusoids, which are shown to be similar to the original Haegeum sounds. Furthermore, GPU accelerates the synthesis process enabling- real time music synthesis system development, supporting more sound effect, and multiple musical sound compositions.

  • PDF

GPU-based Object Extraction for Real-time Analysis of Large-scale Radar Signal (대규모 레이더 신호 데이터의 실시간 분석을 위한 GPU 기반 객체 추출 기법)

  • Kang, Young-Min
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.8
    • /
    • pp.1297-1309
    • /
    • 2016
  • In this paper, an efficient connected component labeling (CCL) method was proposed. The proposed method is based on GPU parallelism. The CCL is very important in various applications where images are analysed. However, the label of each pixel is dependent on the connectivity of adjacent pixels so that it is not very easy to be parallelized. In this paper, a GPU-based parallel CCL techniques were proposed and applied to the analysis of radar signal. Since the radar signals contains complex and large data, the efficiency of the algorithm is crucial when realtime analysis is required. The experimental results show the proposed method is efficient enough to be successfully applied to this application.

Matrix Addition & Scalar Multiplication on the GPU (GPU 기반 행렬 덧셈 및 스칼라 곱셈 알고리즘)

  • Park, Sangkun
    • Journal of Institute of Convergence Technology
    • /
    • v.8 no.1
    • /
    • pp.15-20
    • /
    • 2018
  • Recently a GPU has acquired programmability to perform general purpose computation fast by running thousands of threads concurrently. This paper presents a parallel GPU computation algorithm for dense matrix-matrix addition and scalar multiplication using OpenGL compute shader. It can play a very important role as a fundamental building block for many high-performance computing applications. Experimental results on NVIDIA Quad 4000 show that the proposed algorithm runs 21 times faster than CPU algorithm and achieves performance of 16 GFLOPS in single precision for dense matrices with size 4,096. Such performance proves that our algorithm is practical for real applications.

Performance Analysis of DNN inference using OpenCV Built in CPU and GPU Functions (OpenCV 내장 CPU 및 GPU 함수를 이용한 DNN 추론 시간 복잡도 분석)

  • Park, Chun-Su
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.1
    • /
    • pp.75-78
    • /
    • 2022
  • Deep Neural Networks (DNN) has become an essential data processing architecture for the implementation of multiple computer vision tasks. Recently, DNN-based algorithms achieve much higher recognition accuracy than traditional algorithms based on shallow learning. However, training and inference DNNs require huge computational capabilities than daily usage purposes of computers. Moreover, with increased size and depth of DNNs, CPUs may be unsatisfactory since they use serial processing by default. GPUs are the solution that come up with greater speed compared to CPUs because of their Parallel Processing/Computation nature. In this paper, we analyze the inference time complexity of DNNs using well-known computer vision library, OpenCV. We measure and analyze inference time complexity for three cases, CPU, GPU-Float32, and GPU-Float16.

Implementation of Efficient Power Method on CUDA GPU (CUDA 기반 GPU에서 효율적인 Power Method의 구현)

  • Kim, Jung-Hwan;Kim, Jin-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.9-16
    • /
    • 2011
  • GPU computing is emerging in high performance application area since it can easily exploit massive parallelism in a way of cost-effective computing. The power method which finds the eigen vector of a given matrix is widely used in various applications such as PageRank for calculating importance of web pages. In this research we made the power method efficiently parallelized on GPU and also suggested how it can be improved to enhance its performance. The power method mainly consists of matrix-vector product and it can be easily parallelized. However, it should decide the convergence of the eigen vector and need scaling of the vector subsequently. Such operations incur several calls to GPU kernels and data movement between host and GPU memories. We improved the performance of the power method by means of reduced calls to GPU kernels, optimized thread allocation and enhanced decision operation for the convergence.