• Title/Summary/Keyword: GPU Acceleration

Search Result 76, Processing Time 0.032 seconds

Image Identifier based on Local Feature's Histogram and Acceleration Technique using GPU (지역 특징 히스토그램 기반 영상식별자와 GPU 가속화)

  • Jeon, Hyeok-June;Seo, Yong-Seok;Hwang, Chi-Jung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.9
    • /
    • pp.889-897
    • /
    • 2010
  • Recently, a cutting-edge large-scale image database system has demanded these attributes: search with alarming speed, performs with high accuracy, archives efficiently and much more. An image identifier (descriptor) is for measuring the similarity of two images which plays an important role in this system. The extraction method of an image identifier can be roughly classified into two methods: a local and global method. In this paper, the proposed image identifier, LFH(Local Feature's Histogram), is obtained by a histogram of robust and distinctive local descriptors (features) constrained by a district sub-division of a local region. Furthermore, LFH has not only the properties of a local and global descriptor, but also can perform calculations at a magnificent clip to determine distance with pinpoint accuracy. Additionally, we suggested a way to extract LFH via GPU (OpenGL and GLSL). In this experiment, we have compared the LFH with SIFT (local method) and EHD (global method) via storage capacity, extraction and retrieval time along with accuracy.

Quad Tree Based 2D Smoke Super-resolution with CNN (CNN을 이용한 Quad Tree 기반 2D Smoke Super-resolution)

  • Hong, Byeongsun;Park, Jihyeok;Choi, Myungjin;Kim, Changhun
    • Journal of the Korea Computer Graphics Society
    • /
    • v.25 no.3
    • /
    • pp.105-113
    • /
    • 2019
  • Physically-based fluid simulation takes a lot of time for high resolution. To solve this problem, there are studies that make up the limitation of low resolution fluid simulation by using deep running. Among them, Super-resolution, which converts low-resolution simulation data to high resolution is under way. However, traditional techniques require to the entire space where there are no density data, so there are problems that are inefficient in terms of the full simulation speed and that cannot be computed with the lack of GPU memory as input resolution increases. In this paper, we propose a new method that divides and classifies 2D smoke simulation data into the space using the quad tree, one of the spatial partitioning methods, and performs Super-resolution only required space. This technique accelerates the simulation speed by computing only necessary space. It also processes the divided input data, which can solve GPU memory problems.

Multi-DNN Acceleration Techniques for Embedded Systems with Tucker Decomposition and Hidden-layer-based Parallel Processing (터커 분해 및 은닉층 병렬처리를 통한 임베디드 시스템의 다중 DNN 가속화 기법)

  • Kim, Ji-Min;Kim, In-Mo;Kim, Myung-Sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.6
    • /
    • pp.842-849
    • /
    • 2022
  • With the development of deep learning technology, there are many cases of using DNNs in embedded systems such as unmanned vehicles, drones, and robotics. Typically, in the case of an autonomous driving system, it is crucial to run several DNNs which have high accuracy results and large computation amount at the same time. However, running multiple DNNs simultaneously in an embedded system with relatively low performance increases the time required for the inference. This phenomenon may cause a problem of performing an abnormal function because the operation according to the inference result is not performed in time. To solve this problem, the solution proposed in this paper first reduces the computation by applying the Tucker decomposition to DNN models with big computation amount, and then, make DNN models run in parallel as much as possible in the unit of hidden layer inside the GPU. The experimental result shows that the DNN inference time decreases by up to 75.6% compared to the case before applying the proposed technique.

Grid Acceleration Structure for Efficiently Tracing the Secondary Rays in Dynamic Scenes on Mobile Platforms (모바일 환경에서의 동적 장면의 효율적인 이차 광선 추적을 위한 격자 가속 구조)

  • Seo, Woong;Choi, Byeongjun;Ihm, Insung
    • Journal of KIISE
    • /
    • v.44 no.6
    • /
    • pp.573-580
    • /
    • 2017
  • Despite the recent remarkable advances in the computing power of mobile devices, the heat and battery problems still restrict their performances, particularly compared to PCs. Therefore, in the application of the ray-tracing technique for high-quality rendering, the consideration of a method that traces only the secondary rays while the effects of the primary rays are generated through rasterization-based OpenGL ES rendering is worthwhile. Given that most of the rendering time is for the secondary-ray processing in such a method, a new volume-grid technique for dynamic scenes that enhances the tracing performance of the secondary rays with a low coherence is proposed here. The proposed method attempts to model all of the possible spatial secondary rays in a fixed number of sampling rays, thereby alleviating the visitation problem regarding all of the cells along the ray in a uniform grid. Also, a hybrid rendering pipeline that speeds up the overall rendering performance by exploiting the mobile-device CPU and GPU is presented.

Parallel Implementation of the Recursive Least Square for Hyperspectral Image Compression on GPUs

  • Li, Changguo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.7
    • /
    • pp.3543-3557
    • /
    • 2017
  • Compression is a very important technique for remotely sensed hyperspectral images. The lossless compression based on the recursive least square (RLS), which eliminates hyperspectral images' redundancy using both spatial and spectral correlations, is an extremely powerful tool for this purpose, but the relatively high computational complexity limits its application to time-critical scenarios. In order to improve the computational efficiency of the algorithm, we optimize its serial version and develop a new parallel implementation on graphics processing units (GPUs). Namely, an optimized recursive least square based on optimal number of prediction bands is introduced firstly. Then we use this approach as a case study to illustrate the advantages and potential challenges of applying GPU parallel optimization principles to the considered problem. The proposed parallel method properly exploits the low-level architecture of GPUs and has been carried out using the compute unified device architecture (CUDA). The GPU parallel implementation is compared with the serial implementation on CPU. Experimental results indicate remarkable acceleration factors and real-time performance, while retaining exactly the same bit rate with regard to the serial version of the compressor.

An Algorithm for Finding Surface Atoms of a Protein Molecule Based on Voxel Map Representation (복셀 맵을 이용한 단백질 표면 원자의 발견 알고리즘)

  • Kim, Byung-Joo;Kim, Ku-Jin;Seong, Joon-Kyung
    • The KIPS Transactions:PartA
    • /
    • v.19A no.2
    • /
    • pp.73-76
    • /
    • 2012
  • In this paper, we propose an efficient method to extract surface atoms from a protein molecule. Surface atoms are defined as a set of atoms who can contact given probe solvent $P$, where $P$ does not collide with the molecule. The atoms contained in the molecule are represented as a set of spheres with van der Waals radii. The probe solvent also is represented as a sphere. We propose a method to extract the surface atoms by computing the offset surface of the molecule with respect to the radius of $P$. For efficient computation of the offset surface of a molecule, a voxel map is constructed for the offset surfaces of the spheres. Based on GPU (graphic processor unit) acceleration, a data parallel algorithm is used to extract the surface atoms in 42.87 milliseconds for the molecule containing up to 6,412 atoms.

Real-time Virtual View Synthesis using Virtual Viewpoint Disparity Estimation and Convergence Check (가상 변이맵 탐색과 수렴 조건 판단을 이용한 실시간 가상시점 생성 방법)

  • Shin, In-Yong;Ho, Yo-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.1A
    • /
    • pp.57-63
    • /
    • 2012
  • In this paper, we propose a real-time view interpolation method using virtual viewpoint disparity estimation and convergence check. For the real-time process, we estimate a disparity map at the virtual viewpoint from stereo images using the belief propagation method. This method needs only one disparity map, compared to the conventional methods that need two disparity maps. In the view synthesis part, we warp pixels from the reference images to the virtual viewpoint image using the disparity map at the virtual viewpoint. For real-time acceleration, we utilize a high speed GPU parallel programming, called CUDA. As a result, we can interpolate virtual viewpoint images in real-time.

A Study on Scenario-based Urban Flood Prediction using G2D Flood Analysis Model (G2D 침수해석 모형을 이용한 시나리오 기반 도시 침수예측 연구)

  • Hui-Seong Noh;Ki-Hong Park
    • Journal of Advanced Navigation Technology
    • /
    • v.27 no.4
    • /
    • pp.488-494
    • /
    • 2023
  • In this paper, scenario-based urban flood prediction for the entire Jinju city was performed, and a simulation domain was constructed using G2D as a 2-dimensional urban flood analysis model. The domain configuration is DEM, and the land cover map is used to set the roughness coefficient for each grid. The input data of the model are water level, water depth and flow rate. In the simulation of the built G2D model, virtual rainfall (3 mm/10 min rainfall given to all grids for 5 hours) and virtual flow were applied. And, a GPU acceleration technique was applied to determine whether to run the flood analysis model in the target area. As a result of the simulation, it was confirmed that the high-resolution flood analysis time was significantly shortened and the flood depth for visual flood judgment could be created for each simulation time.

Acceleration of Anisotropic Elastic Reverse-time Migration with GPUs (GPU를 이용한 이방성 탄성 거꿀 참반사 보정의 계산가속)

  • Choi, Hyungwook;Seol, Soon Jee;Byun, Joongmoo
    • Geophysics and Geophysical Exploration
    • /
    • v.18 no.2
    • /
    • pp.74-84
    • /
    • 2015
  • To yield physically meaningful images through elastic reverse-time migration, the wavefield separation which extracts P- and S-waves from reconstructed vector wavefields by using elastic wave equation is prerequisite. For expanding the application of the elastic reverse-time migration to anisotropic media, not only the anisotropic modelling algorithm but also the anisotropic wavefield separation is essential. The anisotropic wavefield separation which uses pseudo-derivative filters determined according to vertical velocities and anisotropic parameters of elastic media differs from the Helmholtz decomposition which is conventionally used for the isotropic wavefield separation. Since applying these pseudo-derivative filter consumes high computational costs, we have developed the efficient anisotropic wavefield separation algorithm which has capability of parallel computing by using GPUs (Graphic Processing Units). In addition, the highly efficient anisotropic elastic reverse-time migration algorithm using MPI (Message-Passing Interface) and incorporating the developed anisotropic wavefield separation algorithm with GPUs has been developed. To verify the efficiency and the validity of the developed anisotropic elastic reverse-time migration algorithm, a VTI elastic model based on Marmousi-II was built. A synthetic multicomponent seismic data set was created using this VTI elastic model. The computational speed of migration was dramatically enhanced by using GPUs and MPI and the accuracy of image was also improved because of the adoption of the anisotropic wavefield separation.

A Simplified Graphics System Based on Direct Rendering Manager System

  • Baek, Nakhoon
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.2
    • /
    • pp.125-129
    • /
    • 2018
  • In the field of computer graphics, rendering speed is one of the most important factors. Contemporary rendering is performed using 3D graphics systems with windowing system support. Since typical graphics systems, including OpenGL and the DirectX library, focus on the variety of graphics rendering features, the rendering process itself consists of many complicated operations. In contrast, early computer systems used direct manipulation of computer graphics hardware, and achieved simple and efficient graphics handling operations. We suggest an alternative method of accelerated 2D and 3D graphics output, based on directly accessing modern GPU hardware using the direct rendering manager (DRM) system. On the basis of this DRM support, we exchange the graphics instructions and graphics data directly, and achieve better performance than full 3D graphics systems. We present a prototype system for providing a set of simple 2D and 3D graphics primitives. Experimental results and their screen shots are included.