• Title/Summary/Keyword: NVIDIA

Search Result 163, Processing Time 0.035 seconds

EFFICIENT COMPUTATION OF COMPRESSIBLE FLOW BY HIGHER-ORDER METHOD ACCELERATED USING GPU (고차 정확도 수치기법의 GPU 계산을 통한 효율적인 압축성 유동 해석)

  • Chang, T.K.;Park, J.S.;Kim, C.
    • Journal of computational fluids engineering
    • /
    • v.19 no.3
    • /
    • pp.52-61
    • /
    • 2014
  • The present paper deals with the efficient computation of higher-order CFD methods for compressible flow using graphics processing units (GPU). The higher-order CFD methods, such as discontinuous Galerkin (DG) methods and correction procedure via reconstruction (CPR) methods, can realize arbitrary higher-order accuracy with compact stencil on unstructured mesh. However, they require much more computational costs compared to the widely used finite volume methods (FVM). Graphics processing unit, consisting of hundreds or thousands small cores, is apt to massive parallel computations of compressible flow based on the higher-order CFD methods and can reduce computational time greatly. Higher-order multi-dimensional limiting process (MLP) is applied for the robust control of numerical oscillations around shock discontinuity and implemented efficiently on GPU. The program is written and optimized in CUDA library offered from NVIDIA. The whole algorithms are implemented to guarantee accurate and efficient computations for parallel programming on shared-memory model of GPU. The extensive numerical experiments validates that the GPU successfully accelerates computing compressible flow using higher-order method.

Efficient Implementation of Candidate Region Extractor for Pedestrian Detection System with Stereo Camera based on GP-GPU (스테레오 영상 보행자 인식 시스템의 후보 영역 검출을 위한 GP-GPU 기반의 효율적 구현)

  • Jeong, Geun-Yong;Jeong, Jun-Hee;Lee, Hee-Chul;Jeon, Gwang-Gil;Cho, Joong-Hwee
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.2
    • /
    • pp.121-128
    • /
    • 2013
  • There have been various research efforts for pedestrian recognition in embedded imaging systems. However, many suffer from their heavy computational complexities. SVM classification method has been widely used for pedestrian recognition. The reduction of candidate region is crucial for low-complexity scheme. In this paper, We propose a real time HOG based pedestrian detection system on GPU which images are captured by a pair of cameras. To speed up humans on road detection, the proposed method reduces a number of detection windows with disparity-search and near-search algorithm and uses the GPU and the NVIDIA CUDA framework. This method can be achieved speedups of 20% or more compared to the recent GPU implementations. The effectiveness of our algorithm is demonstrated in terms of the processing time and the detection performance.

A Method for accelerating training of Convolutional Neural Network (합성곱 신경망의 학습 가속화를 위한 방법)

  • Choi, Se Jin;Jung, Jun Mo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.3 no.4
    • /
    • pp.171-175
    • /
    • 2017
  • Recently, Training of the convolutional neural network (CNN) entails many iterative computations. Therefore, a method of accelerating the training speed through parallel processing using the hardware specifications of GPGPU is actively researched. In this paper, the operations of the feature extraction unit and the classification unit are divided into blocks and threads of GPGPU and processed in parallel. Convolution and Pooling operations of the feature extraction unit are processed in parallel at once without sequentially processing. As a result, proposed method improved the training time about 314%.

Warp-based Emotion-adaptive Real-Time Transforming Technique of Character's Facial Expression (워핑 기반의 감정 적응형 실시간 캐릭터 표정변환 기법)

  • Bae, Dong-Hee;Kim, Jin-Mo;Yun, Do-Kyung;Cho, Hyung-Je
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06a
    • /
    • pp.434-437
    • /
    • 2011
  • 최근 단일 프로세서의 성능 개선이 한계에 이르고, 이에 따라 데이터 병렬 처리를 통한 시스템 성능 개선에 관한 연구가 활발히 진행되고 있다. 또한 이러한 변화로 인해 영상처리 분야에서도 대규모 연산의 병렬 컴퓨팅 수행에 관한 연구가 꾸준히 진행되고 있으며 하드웨어 또한 발전하여 실시간 시스템에 영상처리 분야가 많이 활용되고 있다. 본 논문에서는 캐릭터의 감정 상태에 따른 표정을 영상처리 분야에서 많이 사용되고 있는 이미지 워핑 기법을 적용하여 변화시킨다. 인간이 표현할 수 있는 기본적인 감정에 따른 표정을 데이터베이스로 정리하여 캐릭터에게 임의의 감정값이 주어지면 그에 맞는 표정을 데이터베이스에서 선택하여 사용자가 설정한 프레임만큼 워핑을 수행한다. 하지만 매 프레임에 대해 정해져 있는 제어선에 따라 움직이는 픽셀들의 워핑 연산은 그 계산량이 너무 많아 실시간으로 처리하기에 여러 가지 제약이 뒤따른다. 따라서 이를 실시간으로 처리하기 위해 NVIDIA의 CUDA를 활용한 데이터 병렬처리를 수행하여 실시간 처리가 가능하게 하는 방법을 제안하고, 실험을 통해 그 유용성을 제시한다.

A Study on GPGPU Performance Improvement Technique on GCN Architecture Using OpenCL API (GCN 아키텍쳐 상에서의 OpenCL을 이용한 GPGPU 성능향상 기법 연구)

  • Woo, DongHee;Kim, YoonHo
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.1
    • /
    • pp.37-45
    • /
    • 2018
  • The current system upon which a variety of programs are in operation has continuously expanded its domain from conventional single-core and multi-core system to many-core and heterogeneous system. However, existing researches have focused mostly on parallelizing programs based CUDA framework and rarely on AMD based GCN-GPU optimization. In light of the aforementioned problems, our study focuses on the optimization techniques of the GCN architecture in a GPGPU environment and achieves a performance improvement. Specifically, by using performance techniques we propose, we have reduced more then 30% of the computation time of matrix multiplication and convolution algorithm in GPGPU. Also, we increase the kernel throughput by more then 40%.

Fast Stereo matching based on Plane-converging Belief Propagation using GPU (Plane-converging Belief Propagation을 이용한 고속 스테레오매칭)

  • Jung, Young-Han;Park, Eun-Soo;Kim, Hak-Il;Huh, Uk-Youl
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.2
    • /
    • pp.88-95
    • /
    • 2011
  • Stereo matching is the research area that regarding the estimation of the distance between objects and camera using different view points and it still needs lot of improvements in aspects of speed and accuracy. This paper presents a fast stereo matching algorithm based on plane-converging belief propagation that uses message passing convergence in hierarchical belief propagation. Also, stereo matching technique is developed using GPU and it is available for real-time applications. The error rate of proposed Plane-converging Belief Propagation algorithm is similar to the conventional Hierarchical Belief Propagation algorithm, while speed-up factor reaches 2.7 times.

Performance Study of Satellite Image Processing on Graphics Processors Unit Using CUDA

  • Jeong, In-Kyu;Hong, Min-Gee;Hahn, Kwang-Soo;Choi, Joonsoo;Kim, Choen
    • Korean Journal of Remote Sensing
    • /
    • v.28 no.6
    • /
    • pp.683-691
    • /
    • 2012
  • High resolution satellite images are now widely used for a variety of mapping applications including photogrammetry, GIS data acquisition and visualization. As the spectral and spatial data size of satellite images increases, a greater processing power is needed to process the images. The solution of these problems is parallel systems. Parallel processing techniques have been developed for improving the performance of image processing along with the development of the computational power. However, conventional CPU-based parallel computing is often not good enough for the demand for computational speed to process the images. The GPU is a good candidate to achieve this goal. Recently GPUs are used in the field of highly complex processing including many loop operations such as mathematical transforms, ray tracing. In this study we proposed a technique for parallel processing of high resolution satellite images using GPU. We implemented a spectral radiometric processing algorithm on Landsat-7 ETM+ imagery using CUDA, a parallel computing architecture developed by NVIDIA for GPU. Also performance of the algorithm on GPU and CPU is compared.

CUDA based parallel design of a shot change detection algorithm using frame segmentation and object movement

  • Kim, Seung-Hyun;Lee, Joon-Goo;Hwang, Doo-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.7
    • /
    • pp.9-16
    • /
    • 2015
  • This paper proposes the parallel design of a shot change detection algorithm using frame segmentation and moving blocks. In the proposed approach, the high parallel processing components, such as frame histogram calculation, block histogram calculation, Otsu threshold setting function, frame moving operation, and block histogram comparison, are designed in parallel for NVIDIA GPU. In order to minimize memory access delay time and guarantee fast computation, the output of a GPU kernel becomes the input data of another kernel in a pipeline way using the shared memory of GPU. In addition, the optimal sizes of CUDA processing blocks and threads are estimated through the prior experiments. In the experimental test of the proposed shot change detection algorithm, the detection rate of the GPU based parallel algorithm is the same as that of the CPU based algorithm, but the average of processing time speeds up about 6~8 times.

Algorithmic GPGPU Memory Optimization

  • Jang, Byunghyun;Choi, Minsu;Kim, Kyung Ki
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.4
    • /
    • pp.391-406
    • /
    • 2014
  • The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access-pattern-aware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPGPU workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.

Improving 3D Measurement Speed using CUDA (CUDA를 이용한 3D 측정 속도 향상)

  • Kim, Ho-Joong;Cho, Tai-Hoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.331-334
    • /
    • 2017
  • Recently, a method using a fringe pattern is widely used for 3D measurements. This is a method of measuring by using a phase value obtained by projecting a pattern to an object to be measured. This method requires many operations such as calculating the phase value and calculating the height. It takes a lot of time depending on the amount of computation. In this paper, we present a method using NVIDIA's CUDA to reduce this time. And we introduce the method of calculating phase value and height. It also shows the exact time difference between the CPU version and the CUDA version. This method is very effective because it can process the same operation in a shorter time.

  • PDF