• Title/Summary/Keyword: Graphics processing unit(GPU)

Search Result 154, Processing Time 0.024 seconds

A Fully Programmable Shader Processor for Low Power Mobile Devices (저전력 모바일 장치를 위한 완전 프로그램 가능형 쉐이더 프로세서)

  • Jeong, Hyung-Ki;Lee, Joo-Sock;Park, Tae-Ryong;Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.13 no.2
    • /
    • pp.253-259
    • /
    • 2009
  • In this paper, we propose a novel architecture of a general graphics shader processor without a dedicated hardware. Recently, mobile devices require the high performance graphics processor as well as the small size, low power. The proposed shader processor is a GP-GPU(General-Purpose computing on Graphics Processing Units) to execute the whole OpenGL ES 2.0 graphics pipeline by using shader instructions. It does not require the separate dedicate H/W such as rasterization on this fully programmable capability. The fully programmable 3D graphics shader processor can reduce much of the graphics hardware. The chip size of the designed shader processor is reduced 60% less than the sizes of previous processors.

  • PDF

A Realization of FPGA-based Image Recognition System (FPGA기반 영상인식 시스템 구현)

  • Young Yun
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.11a
    • /
    • pp.349-350
    • /
    • 2022
  • Recently, AI (Artificial Intelligence) has been applied to various technologies such as automatic driving, robot and smart communication. Currently, AI system is developed by software-based method using tensor flow, and GPU (Graphic Processing Unit) is employed for processing unit. In this work, we developed an FPGA-based (Field Programmable Gate Array) AI system , and report on image recognition system to realize the AI system.

  • PDF

Fast Stereoscopic 3D Broadcasting System using x264 and GPU (x264와 GPU를 이용한 고속 양안식 3차원 방송 시스템)

  • Choi, Jung-Ah;Shin, In-Yong;Ho, Yo-Sung
    • Journal of Broadcast Engineering
    • /
    • v.15 no.4
    • /
    • pp.540-546
    • /
    • 2010
  • Since the stereoscopic 3-dimensional (3D) video that provides users with a realistic multimedia service requires twice as much data as 2-dimensional (2D) video, it is difficult to construct the fast system. In this paper, we propose a fast stereoscopic 3D broadcasting system based on the depth information. Before the transmission, we encode the input 2D+depth video using x264, an open source H.264/AVC fast encoder to reduce the size of the data. At the receiver, we decode the transmitted bitstream in real time using a compute unified device architecture (CUDA) video decoder API on NVIDIA graphics processing unit (GPU). Then, we apply a fast view synthesis method that generates the virtual view using GPU. The proposed system can display the output video in both 2DTV and 3DTV. From the experiment, we verified that the proposed system can service the stereoscopic 3D contents in 24 frames per second at most.

GPU-Accelerated Single Image Depth Estimation with Color-Filtered Aperture

  • Hsu, Yueh-Teng;Chen, Chun-Chieh;Tseng, Shu-Ming
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.3
    • /
    • pp.1058-1070
    • /
    • 2014
  • There are two major ways to implement depth estimation, multiple image depth estimation and single image depth estimation, respectively. The former has a high hardware cost because it uses multiple cameras but it has a simple software algorithm. Conversely, the latter has a low hardware cost but the software algorithm is complex. One of the recent trends in this field is to make a system compact, or even portable, and to simplify the optical elements to be attached to the conventional camera. In this paper, we present an implementation of depth estimation with a single image using a graphics processing unit (GPU) in a desktop PC, and achieve real-time application via our evolutional algorithm and parallel processing technique, employing a compute shader. The methods greatly accelerate the compute-intensive implementation of depth estimation with a single view image from 0.003 frames per second (fps) (implemented in MATLAB) to 53 fps, which is almost twice the real-time standard of 30 fps. In the previous literature, to the best of our knowledge, no paper discusses the optimization of depth estimation using a single image, and the frame rate of our final result is better than that of previous studies using multiple images, whose frame rate is about 20fps.

Evaluation of GPU Computing Capacity for All-in-view GNSS SDR Implementation

  • Yun Sub, Choi;Hung Seok, Seo;Young Baek, Kim
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.12 no.1
    • /
    • pp.75-81
    • /
    • 2023
  • In this study, we design an optimized Graphics Processing Unit (GPU)-based GNSS signal processing technique with the goal of designing and implementing a GNSS Software Defined Receiver (SDR) that can operate in real time all-in-view mode under multi-constellation and multi-frequency signal environment. In the proposed structure the correlators of the existing GNSS SDR are processed by the GPU. We designed a memory structure and processing method that can minimize memory access bottlenecks and optimize the GPU memory resource distribution. The designed GNSS SDR can select and operate only the desired GNSS or desired satellite signals by user input. Also, parameters such as the number of quantization bits, sampling rate, and number of signal tracking arms can be selected. The computing capability of the designed GPU-based GNSS SDR was evaluated and it was confirmed that up to 2400 channels can be processed in real time. As a result, the GPU-based GNSS SDR has sufficient performance to operate in real-time all-in-view mode. In future studies, it will be used for more diverse GNSS signal processing and will be applied to multipath effect analysis using more tracking arms.

Performance Study of Satellite Image Processing on Graphics Processors Unit Using CUDA

  • Jeong, In-Kyu;Hong, Min-Gee;Hahn, Kwang-Soo;Choi, Joonsoo;Kim, Choen
    • Korean Journal of Remote Sensing
    • /
    • v.28 no.6
    • /
    • pp.683-691
    • /
    • 2012
  • High resolution satellite images are now widely used for a variety of mapping applications including photogrammetry, GIS data acquisition and visualization. As the spectral and spatial data size of satellite images increases, a greater processing power is needed to process the images. The solution of these problems is parallel systems. Parallel processing techniques have been developed for improving the performance of image processing along with the development of the computational power. However, conventional CPU-based parallel computing is often not good enough for the demand for computational speed to process the images. The GPU is a good candidate to achieve this goal. Recently GPUs are used in the field of highly complex processing including many loop operations such as mathematical transforms, ray tracing. In this study we proposed a technique for parallel processing of high resolution satellite images using GPU. We implemented a spectral radiometric processing algorithm on Landsat-7 ETM+ imagery using CUDA, a parallel computing architecture developed by NVIDIA for GPU. Also performance of the algorithm on GPU and CPU is compared.

Analysis of GPU Performance and Memory Efficiency according to Task Processing Units (작업 처리 단위 변화에 따른 GPU 성능과 메모리 접근 시간의 관계 분석)

  • Son, Dong Oh;Sim, Gyu Yeon;Kim, Cheol Hong
    • Smart Media Journal
    • /
    • v.4 no.4
    • /
    • pp.56-63
    • /
    • 2015
  • Modern GPU can execute mass parallel computation by exploiting many GPU core. GPGPU architecture, which is one of approaches exploiting outstanding computational resources on GPU, executes general-purpose applications as well as graphics applications, effectively. In this paper, we investigate the impact of memory-efficiency and performance according to number of CTAs(Cooperative Thread Array) on a SM(Streaming Multiprocessors), since the analysis of relation between number of CTA on a SM and them provides inspiration for researchers who study the GPU to improve the performance. Our simulation results show that almost benchmarks increasing the number of CTAs on a SM improve the performance. On the other hand, some benchmarks cannot provide performance improvement. This is because the number of CTAs generated from same kernel is a little or the number of CTAs executed simultaneously is not enough. To precisely classify the analysis of performance according to number of CTA on a SM, we also analyze the relations between performance and memory stall, dram stall due to the interconnect congestion, pipeline stall at the memory stage. We expect that our analysis results help the study to improve the parallelism and memory-efficiency on GPGPU architecture.

Fast Self-Collision Handling in Cloth Simulations Using GPU-based Optimized BVH and R-Triangle (GPU 기반의 최적화된 BVH와 R-Triangle을 이용한 옷감 시뮬레이션에서의 빠른 자기충돌 처리)

  • Moon, Seong-Hyeok;Kim, Jong-Hyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.01a
    • /
    • pp.373-376
    • /
    • 2022
  • 본 논문에서는 삼각형 메쉬 기반에서 옷감 시뮬레이션(Cloth simulation)에서 계산양이 큰 자기충돌(Self-collision) 처리를 GPU기반으로 가속화시킬 수 있는 방법에 대해 소개한다. CUDA기반으로 병렬 최적화하기 위해 본 논문에서는 1)재귀적으로 계산하여 충돌판정을 하는 BVH(Bounding volume hierarchy) 트리를 GPU기반에서 효율적으로 빌드, 업데이트, 트리 순회하는 방법을 제안하고, 2)삼각형 메쉬 기반에서는 중복되는 프리미티브(Primitive) 충돌검사를 최소화하기 위해 R-Triangle기법을 GPU에서 최적화 시키는 방법을 소개한다. 결과적으로 본 논문에서 제안하는 기법은 GPU 환경에서 옷감 시뮬레이션의 자기충돌과 객체충돌 처리를 빠르고 효율적으로 처리할 수 있도록 하였고, 다양한 장면에서 실험한 결과 모든 결과에서 빠른 시뮬레이션 결과를 얻을 수 있었다.

  • PDF

Large-scale 3D fast Fourier transform computation on a GPU

  • Jaehong Lee;Duksu Kim
    • ETRI Journal
    • /
    • v.45 no.6
    • /
    • pp.1035-1045
    • /
    • 2023
  • We propose a novel graphics processing unit (GPU) algorithm that can handle a large-scale 3D fast Fourier transform (i.e., 3D-FFT) problem whose data size is larger than the GPU's memory. A 1D FFT-based 3D-FFT computational approach is used to solve the limited device memory issue. Moreover, to reduce the communication overhead between the CPU and GPU, we propose a 3D data-transposition method that converts the target 1D vector into a contiguous memory layout and improves data transfer efficiency. The transposed data are communicated between the host and device memories efficiently through the pinned buffer and multiple streams. We apply our method to various large-scale benchmarks and compare its performance with the state-of-the-art multicore CPU FFT library (i.e., fastest Fourier transform in the West [FFTW]) and a prior GPU-based 3D-FFT algorithm. Our method achieves a higher performance (up to 2.89 times) than FFTW; it yields more performance gaps as the data size increases. The performance of the prior GPU algorithm decreases considerably in massive-scale problems, whereas our method's performance is stable.

Discolored Metal Pad Image Classification Based on Gabor Texture Features Using GPU (GPU를 이용한 Gabor Texture 특징점 기반의 금속 패드 변색 분류 알고리즘)

  • Cui, Xue-Nan;Park, Eun-Soo;Kim, Jun-Chul;Kim, Hak-Il
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.15 no.8
    • /
    • pp.778-785
    • /
    • 2009
  • This paper presents a Gabor texture feature extraction method for classification of discolored Metal pad images using GPU(Graphics Processing Unit). The proposed algorithm extracts the texture information using Gabor filters and constructs a pattern map using the extracted information. Finally, the golden pad images are classified by utilizing the feature vectors which are extracted from the constructed pattern map. In order to evaluate the performance of the Gabor texture feature extraction algorithm based on GPU, a sequential processing and parallel processing using OpenMP in CPU of this algorithm were adopted. Also, the proposed algorithm was implemented by using Global memory and Shared memory in GPU. The experimental results were demonstrated that the method using Shared memory in GPU provides the best performance. For evaluating the effectiveness of extracted Gabor texture features, an experimental validation has been conducted on a database of 20 Metal pad images and the experiment has shown no mis-classification.