• 제목/요약/키워드: Graphics Processing Units

검색결과 85건 처리시간 0.028초

고차 정확도 수치기법의 GPU 계산을 통한 효율적인 압축성 유동 해석 (EFFICIENT COMPUTATION OF COMPRESSIBLE FLOW BY HIGHER-ORDER METHOD ACCELERATED USING GPU)

  • 장태규;박진석;김종암
    • 한국전산유체공학회지
    • /
    • 제19권3호
    • /
    • pp.52-61
    • /
    • 2014
  • The present paper deals with the efficient computation of higher-order CFD methods for compressible flow using graphics processing units (GPU). The higher-order CFD methods, such as discontinuous Galerkin (DG) methods and correction procedure via reconstruction (CPR) methods, can realize arbitrary higher-order accuracy with compact stencil on unstructured mesh. However, they require much more computational costs compared to the widely used finite volume methods (FVM). Graphics processing unit, consisting of hundreds or thousands small cores, is apt to massive parallel computations of compressible flow based on the higher-order CFD methods and can reduce computational time greatly. Higher-order multi-dimensional limiting process (MLP) is applied for the robust control of numerical oscillations around shock discontinuity and implemented efficiently on GPU. The program is written and optimized in CUDA library offered from NVIDIA. The whole algorithms are implemented to guarantee accurate and efficient computations for parallel programming on shared-memory model of GPU. The extensive numerical experiments validates that the GPU successfully accelerates computing compressible flow using higher-order method.

A Review of Computational Phantoms for Quality Assurance in Radiology and Radiotherapy in the Deep-Learning Era

  • Peng, Zhao;Gao, Ning;Wu, Bingzhi;Chen, Zhi;Xu, X. George
    • Journal of Radiation Protection and Research
    • /
    • 제47권3호
    • /
    • pp.111-133
    • /
    • 2022
  • The exciting advancement related to the "modeling of digital human" in terms of a computational phantom for radiation dose calculations has to do with the latest hype related to deep learning. The advent of deep learning or artificial intelligence (AI) technology involving convolutional neural networks has brought an unprecedented level of innovation to the field of organ segmentation. In addition, graphics processing units (GPUs) are utilized as boosters for both real-time Monte Carlo simulations and AI-based image segmentation applications. These advancements provide the feasibility of creating three-dimensional (3D) geometric details of the human anatomy from tomographic imaging and performing Monte Carlo radiation transport simulations using increasingly fast and inexpensive computers. This review first introduces the history of three types of computational human phantoms: stylized medical internal radiation dosimetry (MIRD) phantoms, voxelized tomographic phantoms, and boundary representation (BREP) deformable phantoms. Then, the development of a person-specific phantom is demonstrated by introducing AI-based organ autosegmentation technology. Next, a new development in GPU-based Monte Carlo radiation dose calculations is introduced. Examples of applying computational phantoms and a new Monte Carlo code named ARCHER (Accelerated Radiation-transport Computations in Heterogeneous EnviRonments) to problems in radiation protection, imaging, and radiotherapy are presented from research projects performed by students at the Rensselaer Polytechnic Institute (RPI) and University of Science and Technology of China (USTC). Finally, this review discusses challenges and future research opportunities. We found that, owing to the latest computer hardware and AI technology, computational human body models are moving closer to real human anatomy structures for accurate radiation dose calculations.

GPGPU 기반 Convolutional Neural Network의 효율적인 스레드 할당 기법 (Efficient Thread Allocation Method of Convolutional Neural Network based on GPGPU)

  • 김민철;이광엽
    • 예술인문사회 융합 멀티미디어 논문지
    • /
    • 제7권10호
    • /
    • pp.935-943
    • /
    • 2017
  • 많은 양의 데이터 기반으로 학습하는 neural network 중 이미지 분류나 음성 인식 등에 사용되어 지고 있는 CNN(Convolution neural network)는 현재까지도 우수한 성능을 가진 구조로 계속적으로 발전되고 있다. 제한된 자원을 가진 임베디드 시스템에서 활용하기에는 많은 어려움이 있다. 그래서 미리 학습된 가중치를 사용하지만 여전히 한계점이 있기 때문에 이를 해결하기 위해 GPU의 범용 연산을 위해서 사용하는 GP-GPU(General-Purpose computing on Graphics Processing Units)를 활용하는 추세다. CNN은 단순하고 반복적인 연산을 수행하기 때문에 SIMT(Single Instruction Multiple Thread)기반의 GPGPU에서 스레드 할당과 활용 방법에 따라 연산 속도가 많이 달라진다. 스레드로 Convolution 연산과 Pooling 연산을 수행할 때 쉬어야 하는 스레드가 발생하는 데 이러한 문제를 해결하기 위해 남은 스레드가 다음 피쳐맵과 커널 계산에 활용되는 방법을 사용함으로써 연산 속도를 증가시켰다.

유전 알고리즘의 GPU 구현 기법 및 비교 연구 (GPU Implementation Techniques of Genetic Algorithm and Comparative Studies)

  • 현병용;서기성
    • 제어로봇시스템학회논문지
    • /
    • 제17권4호
    • /
    • pp.328-335
    • /
    • 2011
  • GPU (Graphics Processing Units) is consists of SIMD (Single Instruction Multiple Data) architecture and provides fast parallel processing. A GA (Genetic Algorithm), which requires large computations, is implemented in GPU using CUDA (Compute Unified Device Architecture). Three kinds of execution models are presented according to different combinations of processing modules in GPU. Comparison experiments between GPU models and CPU are tested for a couple of benchmark problems by variation of population sizes and complexity of problem sizes.

GPGPU를 이용한 H.264/AVC 디코더 (Implementation of IQ/IDCT in H.264/AVC Decoder Using GPGPU)

  • 김동한;이광엽
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국해양정보통신학회 2010년도 춘계학술대회
    • /
    • pp.162-164
    • /
    • 2010
  • ITU-T와 ISO가 공동 제정한 동영상 압축 표준 H.264는 기존 동영상 압축 표준에 비해 높은 압축성능과 유연성을 가진다. 본 논문에서는 병렬 처리에 효과적인 GPGPU(General-Purpose computing on Graphics Processing Units)를 이용하여 H.264/AVC 복호화 알고리즘에서 병렬 처리가 가능한 IQ/IDCT (Inverse Quantization/ Inverse Discrete Cosine Transform) 연산을 고속으로 수행하기 위한 효율적인 구조와 방법을 제안한다.

  • PDF

GPGPU를 이용한 가우시안 혼합 모델의 관측확률 계산 성능 향상 (Performance Improvement in Observation Probability Computation of Gaussian Mixture Models Using GPGPU)

  • 김형주;김승희;김상훈;장길진
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2012년도 추계학술발표대회
    • /
    • pp.148-151
    • /
    • 2012
  • 범용 GPU (general-purpose computing on graphics processing units, GPGPU)는 GPU를 일반적인 목적으로 사용하고자 하는 병렬 컴퓨터 구조로써, 과학 연산 등 여러 분야에서 응용 프로그램의 성능을 향상시키기 위하여 사용되고 있다. 본 연구에서는 음성인식기에서 주로 사용되는 가우시안 혼합 모델(Gaussian mixture model, GMM)에서 많은 연산시간을 차지하는 관측확률 계산의 성능을 향상시키고자 GPGPU를 이용하는 알고리즘을 구현하였으며, 기존 CPU 기반 알고리즘 대비 약 13배 연산시간을 단축하였다.

High-Performance Computer-Generated Hologram by Optimized Implementation of Parallel GPGPUs

  • Lee, Yoon-Hyuk;Seo, Young-Ho;Yoo, Ji-Sang;Kim, Dong-Wook
    • Journal of the Optical Society of Korea
    • /
    • 제18권6호
    • /
    • pp.698-705
    • /
    • 2014
  • We propose a new development for calculating a computer-generated hologram (CGH) through the use of multiple general-purpose graphics processing units (GPGPUs). For optimization of the implementation, CGH parallelization, object point tiling, memory selection for object point, hologram tiling, CGMA (compute to global memory access) ratio by block size, and memory mapping were considered. The proposed CGH was equipped with a digital holographic video system consisting of a camera system for capturing images (object points) and CPU/GPGPU software (S/W) for various image processing activities. The proposed system can generate about 37 full HD holograms per second using about 6K object points.

Integer-Pel Motion Estimation for HEVC on Compute Unified Device Architecture (CUDA)

  • Lee, Dongkyu;Sim, Donggyu;Oh, Seoung-Jun
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제3권6호
    • /
    • pp.397-403
    • /
    • 2014
  • A new video compression standard called High Efficiency Video Coding (HEVC) has recently been released onto the market. HEVC provides higher coding performance compared to previous standards, but at the cost of a significant increase in encoding complexity, particularly in motion estimation (ME). At the same time, the computing capabilities of Graphics Processing Units (GPUs) have become more powerful. This paper proposes a parallel integer-pel ME (IME) algorithm for HEVC on GPU using the Compute Unified Device Architecture (CUDA). In the proposed IME, concurrent parallel reduction (CPR) is introduced. CPR performs several parallel reduction (PR) operations concurrently to solve two problems in conventional PR; low thread utilization and high thread synchronization latency. The proposed encoder reduces the portion of IME in the encoder to almost zero with a 2.3% increase in bitrate. In terms of IME, the proposed IME is up to 172.6 times faster than the IME in the HEVC reference model.

RPC 기반 GPU 가상화 환경에서 다중 가상머신의 GPU 메모리 입력으로 인한 커널 함수의 지연 문제 분석 (Analyzing delay of Kernel function owing to GPU memory input from multiple VMs in RPC-based GPU virtualization environments)

  • 강지훈;김수균
    • 한국컴퓨터정보학회:학술대회논문집
    • /
    • 한국컴퓨터정보학회 2021년도 제64차 하계학술대회논문집 29권2호
    • /
    • pp.541-542
    • /
    • 2021
  • 클라우드 컴퓨팅 환경에서는 고성능 컴퓨팅을 지원하기 위해 사용자에게 GPU(Graphic Processing Unit)가 할당된 가상머신을 제공하여 사용자가 고성능 응용을 실행할 수 있도록 지원한다. 일반적인 컴퓨팅 환경에서 한 명의 사용자가 GPU를 독점해서 사용하기 때문에 자원 경쟁으로 인한 문제가 상대적으로 적게 발생하지만 독립적인 여러 사용자가 컴퓨팅 자원을 공유하는 클라우드 환경에서는 자원 경쟁으로 인해 서로 성능 영향을 미치는 문제를 발생시킨다. 본 논문에서는 여러 개의 가상머신이 단일 GPU를 공유하는 RPC(Remote Procedure Call) 기반 GPU 가상화 환경에서 다수의 가상머신이 GPGPU(General Purpose computing on Graphics Processing Units) 작업을 수행할 때 GPU 메모리 입력 경쟁으로 인해 발생하는 커널 함수의 실행 지연 문제를 분석한다.

  • PDF

Computational Analytics of Client Awareness for Mobile Application Offloading with Cloud Migration

  • Nandhini, Uma;TamilSelvan, Latha
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제8권11호
    • /
    • pp.3916-3936
    • /
    • 2014
  • Smartphone applications like games, image processing, e-commerce and social networking are gaining exponential growth, with the ubiquity of cellular services. This demands increased computational power and storage from mobile devices with a sufficiently high bandwidth for mobile internet service. But mobile nodes are highly constrained in the processing and storage, along with the battery power, which further restrains their dependability. Adopting the unlimited storage and computing power offered by cloud servers, it is possible to overcome and turn these issues into a favorable opportunity for the growth of mobile cloud computing. As the mobile internet data traffic is predicted to grow at the rate of around 65 percent yearly, even advanced services like 3G and 4G for mobile communication will fail to accommodate such exponential growth of data. On the other hand, developers extend popular applications with high end graphics leading to smart phones, manufactured with multicore processors and graphics processing units making them unaffordable. Therefore, to address the need of resource constrained mobile nodes and bandwidth constrained cellular networks, the computations can be migrated to resourceful servers connected to cloud. The server now acts as a bridge that should enable the participating mobile nodes to offload their computations through Wi-Fi directly to the virtualized server. Our proposed model enables an on-demand service offloading with a decision support system that identifies the capabilities of the client's hardware and software resources in judging the requirements for offloading. Further, the node's location, context and security capabilities are estimated to facilitate adaptive migration.