• Title/Summary/Keyword: GPU process

Search Result 147, Processing Time 0.022 seconds

An Acceleration Technique of Terrain Rendering using GPU-based Chunk LOD (GPU 기반의 묶음 LOD 기법을 이용한 지형 렌더링의 가속화 기법)

  • Kim, Tae-Gwon;Lee, Eun-Seok;Shin, Byeong-Seok
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.1
    • /
    • pp.69-76
    • /
    • 2014
  • It is hard to represent massive terrain data in real-time even using recent graphics hardware. In order to process massive terrain data, mesh simplification method such as continuous Level-of-Detail is commonly used. However, existing GPU-based methods using quad-tree structure such as geometry splitting, produce lots of vertices to traverse the quad-tree and retransmit those vertices back to the GPU in each tree traversal. Also they have disadvantage of increase of tree size since they construct the tree structure using texture. To solve the problem, we proposed GPU-base chunked LOD technique for real-time terrain rendering. We restrict depth of tree search and generate chunks with tessellator in GPU. By using our method, we can efficiently render the terrain by generating the chunks on GPU and reduce the computing time for tree traversal.

An Implementation of Graphic Offloading Computing using GPU Virtualization based on API Remoting on a Server-based Software Service (서버 기반 SW 서비스에서 API 리모팅 기반의 GPU 가상화를 이용한 그래픽 분할 실행의 구현)

  • Choi, Won-Hyuk;Kim, Won-Young
    • Journal of Internet Computing and Services
    • /
    • v.12 no.6
    • /
    • pp.53-62
    • /
    • 2011
  • In this paper, we introduce a method of graphic offloading computing using a GPU virtualization technology in order to provide high demanding software like 3D software as an on-line software service. When the offloading software is executed on server's software virtualization environment, its graphic works are processed on a client's GPU using GPU virtualization, while on the other its data works are processed on server's CPU. To do that, we propose a method of rendering graphics information on client side GPU using API Remoting method. Also, we show the better performance than server based rendering method when we serve offloading software which include dynamical 3D graphics that display images are frequently changed through on-line. Moreover, we describe a method to virtualize offloading software by a process level and manage client's configuration information in order to decrease server's load when we provide software service to multiple clients.

The Design of Parallel Processing S/W Using CUDA for Realtime 3D Laser Ladar Imaging System (실시간 3차원 레이저 레이더 영상 생성을 위한 CUDA 기반 병렬처리 소프트웨어 설계)

  • Cho, Yong Il;Ha, Choong Lim;Yang, Ji Hyeon;Kim, Jae Hyup
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.1
    • /
    • pp.1-10
    • /
    • 2013
  • In this paper, we propose a CUDA(Common Unified Device Architecture) based SW(software) design method for CPU(Central Processing Unit) and GPU(Graphic Processing Unit) parallel structure to implement real-time process in 3D Laser ladar(LADAR) imaging system. LADAR is a complex system to generate 3-dimensional image based on the laser ranging information, and requires massive process resources in each phase. Therefore, designing and implementing parallel structure are crucial to realize a real-time process within limited system resource. As a conclusion, we can meet the speed of required real-time process allocating separable work load to CUDA GPU by analyzing process algorithm in each phase and confirm the process speed increase by 46%.

Analysis tool for the diffusion model using GPU: SNUDM-G (GPU를 이용한 확산모형 분석 도구: SNUDM-G)

  • Lee, Dajung;Lee, Hyosun;Koh, Sungryong
    • Korean Journal of Cognitive Science
    • /
    • v.33 no.3
    • /
    • pp.155-168
    • /
    • 2022
  • In this paper, we introduce the SNUDM-G, a diffusion model analysis tool with improved computational speed. Although the diffusion model has been applied to explain various cognitive tasks, its use was limited due to computational difficulties. In particular, SNUDM(Koh et al., 2020), one of the diffusion model analysis tools, has a disadvantage in terms of processing speed because it sequentially generates 20,000 data when approximating the diffusion process. To overcome this limitation, we propose to use graphic processing units(GPU) in the process of approximating the diffusion process with a random walk process. Since 20,000 data can be generated in parallel using the graphic processing units, the estimation speed can be increased compared to generating data through sequential processing. As a result of analyzing the data of Experiment 1 by Ratcliff et al. (2004) and recovering the parameters with SNUDM-G using GPU and SNUDM using CPU, SNUDM-G estimated slightly higher values for certain parameters than SNUDM. However, in term of computational speed, SNUDM-G estimated the parameters much faster than SNUDM. This result shows that a more efficient diffusion model analysis for various cognitive tasks is possible using this tool and further suggests that the processing speed of various cognitive models can be improved by using graphic processing units in the future.

The Performance Analysis of GPU-based Cloth simulation according to the Change of Work Group Configuration (워크 그룹 구성 변화에 따른 GPU 기반 천 시뮬레이션의 성능 분석)

  • Choi, Young-Hwan;Hong, Min;Lee, Seung-Hyun;Choi, Yoo-Joo
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.29-36
    • /
    • 2017
  • In these days, 3D dynamic simulation is closely related to many industries. In the past, physically-based 3D simulation was used mainly in the car crash or construction related fields, but it also plays an important role in movies or games today. Many mathematical computations are needed to represent the 3D object realistically, but it is difficult to process a large amount of calculations for simulation of application based on CPU in real-time. Recently, with the advanced graphic hardware and improved architecture, GPU can be utilized for the general purposes of computation function as well as graphic computation. Many approaches using GPU have been applied for various research fields. In this paper, we analyze the performance variation of two cloth simulation algorithms based on GPU according to the change of execution properties of GPU shaders in oder to optimize the performance of GPU-based cloth simulation. Cloth simulation is implemented by the spring centric algorithm and node centric algorithm with GPU parallel computing using compute shader of GLSL 4.3. We compare the performance of between these algorithms according to the change of the size and dimension of work group. The experiment is repeated to 10 times during 5,000 frames for each test and experimental results are provided by averaging of FPS. The experimental result shows that the node centric algorithm is executed in higher speed than the spring centric algorithm.

Design and Implementation of High-Speed Software Cryptographic Modules Using GPU (GPU를 활용한 고속 소프트웨어 암호모듈 설계 및 구현)

  • Song, JinGyo;An, SangWoo;Seo, Seog Chung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1279-1289
    • /
    • 2020
  • To securely protect users' sensitive information and national secrets, the importance of cryptographic modules has been emphasized. Currently, many companies and national organizations are actively using cryptographic modules. In Korea, To ensure the security of these cryptographic modules, the cryptographic module has been verified through the Korea Certificate Module Validation Program(KCMVP). Most of the domestic cryptographic modules are CPU-based software (S/W). However, CPU-based cryptographic modules are difficult to use in servers that need to process large amounts of data. In this paper, we propose an S/W cryptographic module that provides a high-speed operation using GPU. We describe the configuration and operation of the S/W cryptographic module using GPU and present the changes in the cryptographic module security requirements by using GPU. In addition, we present the performance improvement compared to the existing CPU S/W cryptographic module. The results of this paper can be used for cryptographic modules that provide cryptography in servers that manage IoT (Internet of Things) or provide cloud computing.

Scheduling of Artificial Intelligence Workloads in Could Environments Using Genetic Algorithms (유전 알고리즘을 이용한 클라우드 환경의 인공지능 워크로드 스케줄링)

  • Seokmin Kwon;Hyokyung Bahn
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.3
    • /
    • pp.63-67
    • /
    • 2024
  • Recently, artificial intelligence (AI) workloads encompassing various industries such as smart logistics, FinTech, and entertainment are being executed on the cloud. In this paper, we address the scheduling issues of various AI workloads on a multi-tenant cloud system composed of heterogeneous GPU clusters. Traditional scheduling decreases GPU utilization in such environments, degrading system performance significantly. To resolve these issues, we present a new scheduling approach utilizing genetic algorithm-based optimization techniques, implemented within a process-based event simulation framework. Trace driven simulations with diverse AI workload traces collected from Alibaba's MLaaS cluster demonstrate that the proposed scheduling improves GPU utilization compared to conventional scheduling significantly.

A GPU-enabled Face Detection System in the Hadoop Platform Considering Big Data for Images (이미지 빅데이터를 고려한 하둡 플랫폼 환경에서 GPU 기반의 얼굴 검출 시스템)

  • Bae, Yuseok;Park, Jongyoul
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.1
    • /
    • pp.20-25
    • /
    • 2016
  • With the advent of the era of digital big data, the Hadoop platform has become widely used in various fields. However, the Hadoop MapReduce framework suffers from problems related to the increase of the name node's main memory and map tasks for the processing of large number of small files. In addition, a method for running C++-based tasks in the MapReduce framework is required in order to conjugate GPUs supporting hardware-based data parallelism in the MapReduce framework. Therefore, in this paper, we present a face detection system that generates a sequence file for images to process big data for images in the Hadoop platform. The system also deals with tasks for GPU-based face detection in the MapReduce framework using Hadoop Pipes. We demonstrate a performance increase of around 6.8-fold as compared to a single CPU process.

Morphology Operations on CUDA To Remove Skull on MRI Images

  • Izmantoko, Yonny S.;Choi, Heung-Kook
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2012.05a
    • /
    • pp.205-208
    • /
    • 2012
  • Nowadays GPU (Graphic Process Unit) is not only used to show and render some images, but also for another computation. In this paper, we tried to use GPU to do some morphology operations to remove skull from axial MRI images. This skull removing process is an important step in brain segmentation because we would like to work with the brain only, without any skull on it. The result shows that simple morphology operations to remove skull has been successfully applied on MRI images, but there are still many parts that can be develop to get better images.

  • PDF

A Study on the Performance Improvement of Software Digital Filter using GPU (GPU를 이용한 소프트웨어 디지털 필터의 성능개선에 관한 연구)

  • Yeom, Jae-Hwan;Oh, Se-Jin;Roh, Duk-Gyoo;Jung, Dong-Kyu;Hwang, Ju-Yeon;Oh, Chungsik;Kim, Hyo-Ryoung
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.19 no.4
    • /
    • pp.153-161
    • /
    • 2018
  • This paper describes the performance improvement of Software (SW) digital filter using GPU (Graphical Processing Unit). The previous developed SW digital filter has a problem that it operates on a CPU (Central Processing Unit) basis and has a slow speed. The GPU was introduced to filter the data of the EAVN (East Asian VLBI Network) observation to improve the operation speed and to process data with other stations through filtering, respectively. In order to enhance the computational speed of the SW digital filter, NVIDIA Titan V GPU board with built-in Tensor Core is used. The processing speed of about 0.78 (1Gbps, 16MHz BW, 16-IF) and 1.1 (2Gbps, 32MHz BW, 16-IF) times for the observing time was achieved by filtering the 95 second observation data of 2 Gbps (512 MHz BW, 1-IF), respectively. In addition, 2Gbps data is digitally filtered for the 1 and 2Gbps simultaneously observed with KVN (Korean VLBI Network), and compared with the 1Gbps, we obtained similar values such as cross power spectrum, phase, and SNR (Signal to Noise Ratio). As a result, the effectiveness of developed SW digital filter using GPU in this research was confirmed for utilizing the data processing and analysis. In the future, it is expected that the observation data will be able to be filtered in real time when the distributed processing optimization of source code for using multiple GPU boards.