• Title/Summary/Keyword: GPU 가속기법

Search Result 34, Processing Time 0.03 seconds

A Study on Scenario-based Urban Flood Prediction using G2D Flood Analysis Model (G2D 침수해석 모형을 이용한 시나리오 기반 도시 침수예측 연구)

  • Hui-Seong Noh;Ki-Hong Park
    • Journal of Advanced Navigation Technology
    • /
    • v.27 no.4
    • /
    • pp.488-494
    • /
    • 2023
  • In this paper, scenario-based urban flood prediction for the entire Jinju city was performed, and a simulation domain was constructed using G2D as a 2-dimensional urban flood analysis model. The domain configuration is DEM, and the land cover map is used to set the roughness coefficient for each grid. The input data of the model are water level, water depth and flow rate. In the simulation of the built G2D model, virtual rainfall (3 mm/10 min rainfall given to all grids for 5 hours) and virtual flow were applied. And, a GPU acceleration technique was applied to determine whether to run the flood analysis model in the target area. As a result of the simulation, it was confirmed that the high-resolution flood analysis time was significantly shortened and the flood depth for visual flood judgment could be created for each simulation time.

FPGA-Based Acceleration of Range Doppler Algorithm for Real-Time Synthetic Aperture Radar Imaging (실시간 SAR 영상 생성을 위한 Range Doppler 알고리즘의 FPGA 기반 가속화)

  • Jeong, Dongmin;Lee, Wookyung;Jung, Yunho
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.634-643
    • /
    • 2021
  • In this paper, an FPGA-based acceleration scheme of range Doppler algorithm (RDA) is proposed for the real time synthetic aperture radar (SAR) imaging. Hardware architectures of matched filter based on systolic array architecture and a high speed sinc interpolator to compensate range cell migration (RCM) are presented. In addition, the proposed hardware was implemented and accelerated on Xilinx Alveo FPGA. Experimental results for 4096×4096-size SAR imaging showed that FPGA-based implementation achieves 2 times acceleration compared to GPU-based design. It was also confirmed the proposed design can be implemented with 60,247 CLB LUTs, 103,728 CLB registers, 20 block RAM tiles and 592 DPSs at the operating frequency of 312 MHz.

Optimization of Color Format Conversion of WebCam Images Using the CUDA (CUDA를 이용한 웹캠 영상의 색상 형식 변환 최적화)

  • Kim, Jin-Woo;Jung, Yun-Hye;Park, Jin-Hong;Park, Yong-Jin;Han, Tack-Don
    • Journal of Korea Game Society
    • /
    • v.11 no.1
    • /
    • pp.147-157
    • /
    • 2011
  • Webcam doesn't perform memory-alignment in order to reduce the transmission time of image data. Memory-unaligned image data is unsuitable for the processing on GPU. Accordingly, we convert it to available color format for optimization in high speed image processing. In this paper, we propose a technique that accelerates webcam's color format conversion by using NVDIA CUDA. We propose an optimization which is about memory accesses and thread composition, also evaluate memory and computing performance for verifying a hypothesis which is the performance of the proposed architecture and optimizing degree on low-performance GPU. Following the optimization technique, we show performance improvements over maximum 68 percent.

CUDA Acceleration of Super-Resolution Algorithm Using ELBP Classifier for Fisheye Images (광각 영상을 위한 ELBP 분류기를 이용한 초해상도 기법과 CUDA 기반 가속화)

  • Choi, Ji Hoon;Song, Byung Cheol
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.10
    • /
    • pp.84-91
    • /
    • 2016
  • Most recently, the technology of around view monitoring(AVM) system or the security systems could provide users with images by using a fisheye lens. The filmed images through fisheye lens have an advantage of providing a wider range of scenes. On the other hand, filming through fisheye lens also has disadvantages of distorting images. Especially, it causes the sharpness of images to degrade because the edge of images is out of focus. The influence of a blur still remains at the end of the range when the super-resolution techniques is applied in order to enhance the sharpness. It degrades the clarity of high resolution images and occurs artifacts, which leads to deterioration in the performance of super-resolution algorithm. Therefore, in this paper we propose self-similarity-based pre-processing method to improve the sharpness at the edge. Additionally, we implement the acceleration in the GPU environment of entire algorithm and verify the acceleration.

Acceleration of the Iterative Physical Optics Using Graphic Processing Unit (GPU를 이용한 반복적 물리 광학법의 가속화에 대한 연구)

  • Lee, Yong-Hee;Chin, Huicheol;Kim, Kyung-Tae
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.26 no.11
    • /
    • pp.1012-1019
    • /
    • 2015
  • This paper shows the acceleration of iterative physical optics(IPO) for radar cross section(RCS) by using two techniques effectively. For the analysis of the multiple reflection in the cavity, IPO uses the near field method, unlike shooting and bouncing rays method which uses the geometric optics(GO). However, it is still far slower than physical optics(PO) and it is needed to accelerate the speed of IPO for practical purpose. In order to address this problem, graphic processing unit(GPU) can be applied to reduce calculation time and adaptive iterative physical optics-change rate(AIPO-CR) method is also applicable effectively to optimize iteration for acceleration of calculation.

Parallel Rotated Exemplar-based Texture Synthesis (병렬 회전 예제 기반 텍스처 합성)

  • Park, Han-Wook;Kim, Chang-Hun
    • Journal of the Korea Computer Graphics Society
    • /
    • v.15 no.1
    • /
    • pp.17-23
    • /
    • 2009
  • We present a simple new idea to improve the quality of exemplar based texture synthesis using multiple rotated input exemplars. Our algorithm successfully obtain rotational synthesis feature variations and manages to reduce the artifacts in the results, especially patch seams due to the structure of the exemplars provided which have been inappropriate for previous neighborhood matching synthesis algorithms. Our algorithm is parallel in nature, thus it is possible to implement our algorithm using GPU or multi-core CPU to accelerate synthesis process.

  • PDF

A Real-time Single-Pass Visibility Culling Method Based on a 3D Graphics Accelerator Architecture (실시간 단일 패스 가시성 선별 기법 기반의 3차원 그래픽스 가속기 구조)

  • Choo, Catherine;Choi, Moon-Hee;Kim, Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.15A no.1
    • /
    • pp.1-8
    • /
    • 2008
  • An occlusion culling method, one of visibility culling methods, excludes invisible objects or triangles which are covered by other objects. As it reduces computation quantity, occlusion culling is an effective method to handle complex scenes in real-time. But an existing common occlusion culling method, such as hardware occlusion query method, sends objects' data twice to GPU and this causes processing overheads once for occlusion culling test and the other is for rendering. And another existing hardware occlusion culling method, VCBP, can test objects' visibility quickly, but it neither test bounding volume nor return test result to application stage. In this paper, we propose a single pass occlusion culling method which uses temporal and spatial coherency, with effective occlusion culling hardware architecture. In our approach, the hardware performs occlusion culling test rapidly with cache on the rasterization stage where triangles are transformed into fragments. At the same time, hardware sends each primitive's visibility information to application stage. As a result, the application stage reduces data transmission quantity by excluding covered objects using the visibility information on previous frame and hierarchical spatial tree. Our proposed method improved maximum 44%, minimum 14% compared with S&W method based on hardware occlusion query. And the performance is increased 25% and 17% respectively, compared to maximum and minimum performance of CHC method which is based on occlusion culling method.

Comparison of Voxel Map and Sphere Tree Structures for Proximity Computation of Protein Molecules (단백질 분자에 대한 proximity 연산을 위한 복셀 맵과 스피어 트리 구조 비교)

  • Kim, Byung-Joo;Lee, Jung-Eun;Kim, Young-J.;Kim, Ku-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.6
    • /
    • pp.794-804
    • /
    • 2012
  • For the geometric computations on the protein molecules, the proximity queries, such as computing the minimum distance from an arbitrary point to the molecule or detecting the collision between a point and the molecule, are essential. For the proximity queries, the efficiency of the computation time can be different according to the data structure used for the molecule. In this paper, we present the data structures and algorithms for applying proximity queries to a molecule with GPU acceleration. We present two data structures, a voxel map and a sphere tree, where the molecule is represented as a set of spheres, and corresponding algorithms. Moreover, we show that the performance of presented data structures are improved from 3 to 633 times compared to the previous data structure for the molecules containing 1,000~15,000 atoms.

MPEG-I RVS Software Speed-up for Real-time Application (실시간 렌더링을 위한 MPEG-I RVS 가속화 기법)

  • Ahn, Heejune;Lee, Myeong-jin
    • Journal of Broadcast Engineering
    • /
    • v.25 no.5
    • /
    • pp.655-664
    • /
    • 2020
  • Free viewpoint image synthesis technology is one of the important technologies in the MPEG-I (Immersive) standard. RVS (Reference View Synthesizer) developed by MPEG-I and in use in MPEG group is a DIBR (Depth Information-Based Rendering) program that generates an image at a virtual (intermediate) viewpoint from multiple viewpoints' inputs. RVS uses the mesh surface method based on computer graphics, and outperforms the pixel-based ones by 2.5dB or more compared to the previous pixel method. Even though its OpenGL version provides 10 times speed up over the non OpenGL based one, it still shows a non-real-time processing speed, i.e., 0.75 fps on the two 2k resolution input images. In this paper, we analyze the internal of RVS implementation and modify its structure, achieving 34 times speed up, therefore, real-time performance (22-26 fps), through the 3 key improvements: 1) the reuse of OpenGL buffers and texture objects 2) the parallelization of file I/O and OpenGL execution 3) the parallelization of GPU shader program and buffer transfer.

Deep Learning Based On-Device Augmented Reality System using Multiple Images (다중영상을 이용한 딥러닝 기반 온디바이스 증강현실 시스템)

  • Jeong, Taehyeon;Park, In Kyu
    • Journal of Broadcast Engineering
    • /
    • v.27 no.3
    • /
    • pp.341-350
    • /
    • 2022
  • In this paper, we propose a deep learning based on-device augmented reality (AR) system in which multiple input images are used to implement the correct occlusion in a real environment. The proposed system is composed of three technical steps; camera pose estimation, depth estimation, and object augmentation. Each step employs various mobile frameworks to optimize the processing on the on-device environment. Firstly, in the camera pose estimation stage, the massive computation involved in feature extraction is parallelized using OpenCL which is the GPU parallelization framework. Next, in depth estimation, monocular and multiple image-based depth image inference is accelerated using the mobile deep learning framework, i.e. TensorFlow Lite. Finally, object augmentation and occlusion handling are performed on the OpenGL ES mobile graphics framework. The proposed augmented reality system is implemented as an application in the Android environment. We evaluate the performance of the proposed system in terms of augmentation accuracy and the processing time in the mobile as well as PC environments.