• Title/Summary/Keyword: GPU Acceleration

Search Result 78, Processing Time 0.024 seconds

Acceleration of Anisotropic Elastic Reverse-time Migration with GPUs (GPU를 이용한 이방성 탄성 거꿀 참반사 보정의 계산가속)

  • Choi, Hyungwook;Seol, Soon Jee;Byun, Joongmoo
    • Geophysics and Geophysical Exploration
    • /
    • v.18 no.2
    • /
    • pp.74-84
    • /
    • 2015
  • To yield physically meaningful images through elastic reverse-time migration, the wavefield separation which extracts P- and S-waves from reconstructed vector wavefields by using elastic wave equation is prerequisite. For expanding the application of the elastic reverse-time migration to anisotropic media, not only the anisotropic modelling algorithm but also the anisotropic wavefield separation is essential. The anisotropic wavefield separation which uses pseudo-derivative filters determined according to vertical velocities and anisotropic parameters of elastic media differs from the Helmholtz decomposition which is conventionally used for the isotropic wavefield separation. Since applying these pseudo-derivative filter consumes high computational costs, we have developed the efficient anisotropic wavefield separation algorithm which has capability of parallel computing by using GPUs (Graphic Processing Units). In addition, the highly efficient anisotropic elastic reverse-time migration algorithm using MPI (Message-Passing Interface) and incorporating the developed anisotropic wavefield separation algorithm with GPUs has been developed. To verify the efficiency and the validity of the developed anisotropic elastic reverse-time migration algorithm, a VTI elastic model based on Marmousi-II was built. A synthetic multicomponent seismic data set was created using this VTI elastic model. The computational speed of migration was dramatically enhanced by using GPUs and MPI and the accuracy of image was also improved because of the adoption of the anisotropic wavefield separation.

A Simplified Graphics System Based on Direct Rendering Manager System

  • Baek, Nakhoon
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.2
    • /
    • pp.125-129
    • /
    • 2018
  • In the field of computer graphics, rendering speed is one of the most important factors. Contemporary rendering is performed using 3D graphics systems with windowing system support. Since typical graphics systems, including OpenGL and the DirectX library, focus on the variety of graphics rendering features, the rendering process itself consists of many complicated operations. In contrast, early computer systems used direct manipulation of computer graphics hardware, and achieved simple and efficient graphics handling operations. We suggest an alternative method of accelerated 2D and 3D graphics output, based on directly accessing modern GPU hardware using the direct rendering manager (DRM) system. On the basis of this DRM support, we exchange the graphics instructions and graphics data directly, and achieve better performance than full 3D graphics systems. We present a prototype system for providing a set of simple 2D and 3D graphics primitives. Experimental results and their screen shots are included.

A Survey on PIM Acceleration Technology to Overcome Memory Wall Problem (Memory wall 을 극복하기 위한 PIM 가속 기술에 대한 조망)

  • Jung, Heon-Hui;Paek, Yun-Heung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.66-68
    • /
    • 2022
  • 활용도가 높아지고 있는 최근의 딥러닝 애플리케이션 등을 사용하기 위해서 기존의 CPU 구조로는 한계가 있어 GPU, TPU 등의 하드웨어로 가속하려는 노력이 있어왔다. 하지만 물리적인 제약으로 인해 메모리 대역폭에 한계가 있으며, 이를 뛰어넘기 위해 메모리 안에서 직접 연산을 수행하는 Processing-in-Memory 기술이 떠오르고 있다. 본 논문은 PIM 기술을 사용할 때의 불이익을 감수하면서 장점을 최대한 활용하는 방법들에 관해서 서술하였다.

A Study on Data Management Systems for Spatial Assessments of Road Visibilities at Night (야간도로 시인성에 대한 공간적 평가를 위한 자료관리체계 연구)

  • Woo, Hee Sook;Kwon, Kwang Seok;Kim, Byung Guk;Yoon, Chun Joo;Kim, Young Rok
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.22 no.4
    • /
    • pp.107-115
    • /
    • 2014
  • Visibility of the road influence the safe driving because it recognizes the obstacle on the road. In this paper, we propose a mobile data acquisition and processing system for evaluating road visibility at night. And it was converted efficiently with mobile images and archived for spatial analysis of road-visibilities at night. This was applied to the following techniques to the system. Low-power computing units, open an image processing library, GPU-based acceleration techniques and document database techniques, etc. And converting the RGB image to the YUV color system, which was integrated the brightness component and the spatial information. High performance Android devices were used to collect brightness data on roads and it was confirmed whether this prototype was to determine the spatial distribution of such acquisition and management systems for spatial-assessments of road visibility at night.

Comparison of Voxel Map and Sphere Tree Structures for Proximity Computation of Protein Molecules (단백질 분자에 대한 proximity 연산을 위한 복셀 맵과 스피어 트리 구조 비교)

  • Kim, Byung-Joo;Lee, Jung-Eun;Kim, Young-J.;Kim, Ku-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.6
    • /
    • pp.794-804
    • /
    • 2012
  • For the geometric computations on the protein molecules, the proximity queries, such as computing the minimum distance from an arbitrary point to the molecule or detecting the collision between a point and the molecule, are essential. For the proximity queries, the efficiency of the computation time can be different according to the data structure used for the molecule. In this paper, we present the data structures and algorithms for applying proximity queries to a molecule with GPU acceleration. We present two data structures, a voxel map and a sphere tree, where the molecule is represented as a set of spheres, and corresponding algorithms. Moreover, we show that the performance of presented data structures are improved from 3 to 633 times compared to the previous data structure for the molecules containing 1,000~15,000 atoms.

Acceleration for Removing Sea-fog using Graphic Processors and Parallel Processing (그래픽 프로세서를 이용한 병렬연산 기반 해무 제거 고속화)

  • Kim, Young-doo;Kwak, Jae-min;Seo, Young-ho;Choi, Hyun-jun
    • Journal of Advanced Navigation Technology
    • /
    • v.21 no.5
    • /
    • pp.485-490
    • /
    • 2017
  • In this paper, we propose a technique for high speed removal of sea-fog using a graphic processor. This technique uses a host processor(CPU) and several graphics processors(GPU) capable of parallel processing to remove sea-fog from the input image. In the process of removing sea-fog, the dark channel extraction, the maximum brightness channel extraction, and the calculation of the transmission are performed by the host processor, and the process of refining the transmission by applying the bidirectional filter is performed in parallel through the graphic processor. To verify the proposed parallel processing method, three NVIDIA GTX 1070 GPUs were used to construct the verification environment. As a result, it takes about 140ms when implemented with one graphics processor, and 26ms when implemented using OpenMP and multiple GPGPUs. The proposed a parallel processing algorithm based on the graphics processor unit can be used for safe navigation, port control and monitoring system.

Design and Implementation of Accelerator Architecture for Binary Weight Network on FPGA with Limited Resources (한정된 자원을 갖는 FPGA에서의 이진가중치 신경망 가속처리 구조 설계 및 구현)

  • Kim, Jong-Hyun;Yun, SangKyun
    • Journal of IKEEE
    • /
    • v.24 no.1
    • /
    • pp.225-231
    • /
    • 2020
  • In this paper, we propose a method to accelerate BWN based on FPGA with limited resources for embedded system. Because of the limited number of logic elements available, a single computing unit capable of handling Conv-layer, FC-layer of various sizes must be designed and reused. Also, if the input feature map can not be parallel processed at one time, the output must be calculated by reading the inputs several times. Since the number of available BRAM modules is limited, the number of data bits in the BWN accelerator must be minimized. The image classification processing time of the BWN accelerator is superior when compared with a embedded CPU and is faster than a desktop PC and 50% slower than a GPU system. Since the BWN accelerator uses a slow clock of 50MHz, it can be seen that the BWN accelerator is advantageous in performance versus power.

AMG-CG method for numerical analysis of high-rise structures on heterogeneous platforms with GPUs

  • Li, Zuohua;Shan, Qingfei;Ning, Jiafei;Li, Yu;Guo, Kaisheng;Teng, Jun
    • Computers and Concrete
    • /
    • v.29 no.2
    • /
    • pp.93-105
    • /
    • 2022
  • The degrees of freedom (DOFs) of high-rise structures increase rapidly due to the need for refined analysis, which poses a challenge toward a computationally efficient method for numerical analysis of high-rise structures using the finite element method (FEM). This paper presented an efficient iterative method, an algebraic multigrid (AMG) with a Jacobi overrelaxation smoother preconditioned conjugate gradient method (AMG-CG) used for solving large-scale structural system equations running on heterogeneous platforms with parallel accelerator graphics processing units (GPUs) enabled. Furthermore, an AMG-CG FEM application framework was established for the numerical analysis of high-rise structures. In the proposed method, the coarsening method, the optimal relaxation coefficient of the JOR smoother, the smoothing times, and the solution method for the coarsest grid of an AMG preconditioner were investigated via several numerical benchmarks of high-rise structures. The accuracy and the efficiency of the proposed FEM application framework were compared using the mature software Abaqus, and there were speedups of up to 18.4x when using an NVIDIA K40C GPU hosted in a workstation. The results demonstrated that the proposed method could improve the computational efficiency of solving structural system equations, and the AMG-CG FEM application framework was inherently suitable for numerical analysis of high-rise structures.

Acceleration techniques for GPGPU-based Maximum Intensity Projection (GPGPU 환경에서 최대휘소투영 렌더링의 고속화 방법)

  • Kye, Hee-Won;Kim, Jun-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.8
    • /
    • pp.981-991
    • /
    • 2011
  • MIP(Maximum Intensity Projection) is a volume rendering technique which is essential for the medical imaging system. MIP rendering based on the ray casting method produces high quality images but takes a long time. Our aim is improvement of the rendering speed using GPGPU(General-purpose computing on Graphic Process Unit) technique. In this paper, we present the ray casting algorithm based on CUDA(an acronym for Compute Unified Device Architecture) which is a programming language for GPGPU and we suggest new acceleration methods for CUDA. In detail, we propose the block based space leaping which skips unnecessary regions of volume data for CUDA, the bisection method which is a fast method to find a block edge, and the initial value estimation method which improves the probability of space leaping. Due to the proposed methods, we noticeably improve the rendering speed without image quality degradation.

Parallel Implementations of Digital Focus Indices Based on Minimax Search Using Multi-Core Processors

  • HyungTae, Kim;Duk-Yeon, Lee;Dongwoon, Choi;Jaehyeon, Kang;Dong-Wook, Lee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.2
    • /
    • pp.542-558
    • /
    • 2023
  • A digital focus index (DFI) is a value used to determine image focus in scientific apparatus and smart devices. Automatic focus (AF) is an iterative and time-consuming procedure; however, its processing time can be reduced using a general processing unit (GPU) and a multi-core processor (MCP). In this study, parallel architectures of a minimax search algorithm (MSA) are applied to two DFIs: range algorithm (RA) and image contrast (CT). The DFIs are based on a histogram; however, the parallel computation of the histogram is conventionally inefficient because of the bank conflict in shared memory. The parallel architectures of RA and CT are constructed using parallel reduction for MSA, which is performed through parallel relative rating of the image pixel pairs and halved the rating in every step. The array size is then decreased to one, and the minimax is determined at the final reduction. Kernels for the architectures are constructed using open source software to make it relatively platform independent. The kernels are tested in a hexa-core PC and an embedded device using Lenna images of various sizes based on the resolutions of industrial cameras. The performance of the kernels for the DFIs was investigated in terms of processing speed and computational acceleration; the maximum acceleration was 32.6× in the best case and the MCP exhibited a higher performance.