• Title/Summary/Keyword: GPU model

Search Result 163, Processing Time 0.046 seconds

Geographic information 3D Synthetic Model based on Regular Mesh (Regular Mesh 기반 지리정보 3D 합성모델)

  • Jung, Ji-Hwan;Hwang, Sun-Myung;Kim, Sung-Ho
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.4
    • /
    • pp.616-625
    • /
    • 2011
  • There are two representative geometry rendering methods. One is Geometry Clipmaps, another is ROAM 2.0. We propose an extended Geometry Clipmaps algorithm which does not focus on CPU operation but the GPU for faster and wider visibility area. The extended algorithm presents mesh configuration method of each level by LOD, how to configurate Mesh network between levels, mesh block method for rendering optimization using VFC, and image mapping method to get high resolution up to 1 m.

A Performance Analysis of Model Training Due to Different Batch Sizes in Synchronous Distributed Deep Learning Environments (동기식 분산 딥러닝 환경에서 배치 사이즈 변화에 따른 모델 학습 성능 분석)

  • Yerang Kim;HyungJun Kim;Heonchang Yu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.79-80
    • /
    • 2023
  • 동기식 분산 딥러닝 기법은 그래디언트 계산 작업을 다수의 워커가 나누어 병렬 처리함으로써 모델 학습 과정을 효율적으로 단축시킨다. 배치 사이즈는 이터레이션 단위로 처리하는 데이터 개수를 의미하며, 학습 속도 및 학습 모델의 품질에 영향을 미치는 중요한 요소이다. 멀티 GPU 환경에서 작동하는 분산 학습의 경우, 가용 GPU 메모리 용량이 커짐에 따라 선택 가능한 배치 사이즈의 상한이 증가한다. 하지만 배치 사이즈가 학습 속도 및 학습 모델 품질에 미치는 영향은 GPU 활용률, 총 에포크 수, 모델 파라미터 개수 등 다양한 변수에 영향을 받으므로 최적값을 찾기 쉽지 않다. 본 연구는 동기식 분산 딥러닝 환경에서 실험을 통해 최적의 배치 사이즈 선택에 영향을 미치는 주요 요인을 분석한다.

Interactive Hair Styling Interface (인터랙티브 헤어 스타일링 인터페이스)

  • Cho, Jung-Hyun;Ko, Hyeong-Seok
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.455-458
    • /
    • 2009
  • The statistical wisp model for hairstyle generation was introduced in [1]. It provided a program to load human models, set parameters, generate wisps and strands, and make constraints. However, the program used hard-coded human models and prescribed constraints so that it was hard to change different models and manipulate constraints. Hence we provide a simple interface by drawing maps and constraints. Also, we can increase the speed of computation by using GPU acceleration.

  • PDF

Power Modeling Approach for GPU Source Program

  • Li, Junke;Guo, Bing;Shen, Yan;Li, Deguang;Huang, Yanhui
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.1
    • /
    • pp.181-191
    • /
    • 2018
  • Rapid development of information technology makes our environment become smarter and massive high performance computers are providing powerful computing for that. Graphics Processing Unit (GPU) as a typical high performance component is being widely used for both graphics and general-purpose applications. Although it can greatly improve computing power, it also delivers significant power consumption and need sufficient power supplies. To make high performance computing more sustainable, the important step is to measure it. Current power technologies for GPU have some drawbacks, such as they are not applicable for power estimation at the early stage. In this article, we present a novel power technology to correlate power consumption and the characteristics at the programmer perspective, and then to estimate power consumption of source program without prerunning. We conduct experiments on Nvidia's GT740 platform; the results show that our power model is more accurately than regression model and has an average error of 2.34% and the maximum error of 9.65%.

Refinement of protein NMR structures using atomistic force field and implicit solvent model: Comparison of the accuracies of NMR structures with Rosetta refinement

  • Jee, Jun-Goo
    • Journal of the Korean Magnetic Resonance Society
    • /
    • v.26 no.1
    • /
    • pp.1-9
    • /
    • 2022
  • There are two distinct approaches to improving the quality of protein NMR structures during refinement: all-atom force fields and accumulated knowledge-assisted methods that include Rosetta. Mao et al. reported that, for 40 proteins, Rosetta increased the accuracies of their NMR-determined structures with respect to the X-ray crystal structures (Mao et al., J. Am. Chem. Soc. 136, 1893 (2014)). In this study, we calculated 32 structures of those studied by Mao et al. using all-atom force field and implicit solvent model, and we compared the results with those obtained from Rosetta. For a single protein, using only the experimental NOE-derived distances and backbone torsion angle restraints, 20 of the lowest energy structures were extracted as an ensemble from 100 generated structures. Restrained simulated annealing by molecular dynamics simulation searched conformational spaces with a total time step of 1-ns. The use of GPU-accelerated AMBER code allowed the calculations to be completed in hours using a single GPU computer-even for proteins larger than 20 kDa. Remarkably, statistical analyses indicated that the structures determined in this way showed overall higher accuracies to their X-ray structures compared to those refined by Rosetta (p-value < 0.01). Our data demonstrate the capability of sophisticated atomistic force fields in refining NMR structures, particularly when they are coupled with the latest GPU-based calculations. The straightforwardness of the protocol allows its use to be extended to all NMR structures.

YOLOv7 Model Inference Time Complexity Analysis in Different Computing Environments (다양한 컴퓨팅 환경에서 YOLOv7 모델의 추론 시간 복잡도 분석)

  • Park, Chun-Su
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.3
    • /
    • pp.7-11
    • /
    • 2022
  • Object detection technology is one of the main research topics in the field of computer vision and has established itself as an essential base technology for implementing various vision systems. Recent DNN (Deep Neural Networks)-based algorithms achieve much higher recognition accuracy than traditional algorithms. However, it is well-known that the DNN model inference operation requires a relatively high computational power. In this paper, we analyze the inference time complexity of the state-of-the-art object detection architecture Yolov7 in various environments. Specifically, we compare and analyze the time complexity of four types of the Yolov7 model, YOLOv7-tiny, YOLOv7, YOLOv7-X, and YOLOv7-E6 when performing inference operations using CPU and GPU. Furthermore, we analyze the time complexity variation when inferring the same models using the Pytorch framework and the Onnxruntime engine.

A Study on Scenario-based Urban Flood Prediction using G2D Flood Analysis Model (G2D 침수해석 모형을 이용한 시나리오 기반 도시 침수예측 연구)

  • Hui-Seong Noh;Ki-Hong Park
    • Journal of Advanced Navigation Technology
    • /
    • v.27 no.4
    • /
    • pp.488-494
    • /
    • 2023
  • In this paper, scenario-based urban flood prediction for the entire Jinju city was performed, and a simulation domain was constructed using G2D as a 2-dimensional urban flood analysis model. The domain configuration is DEM, and the land cover map is used to set the roughness coefficient for each grid. The input data of the model are water level, water depth and flow rate. In the simulation of the built G2D model, virtual rainfall (3 mm/10 min rainfall given to all grids for 5 hours) and virtual flow were applied. And, a GPU acceleration technique was applied to determine whether to run the flood analysis model in the target area. As a result of the simulation, it was confirmed that the high-resolution flood analysis time was significantly shortened and the flood depth for visual flood judgment could be created for each simulation time.

GPU-accelerated Reliability Analysis Method using Dynamic Reliability Block Diagram based on DEVS Formalism (DEVS 형식론 기반의 Dynamic Reliability Block Diagram과 GPU 가속 기술을 이용한 신뢰도 분석 방법)

  • Ha, Sol;Ku, Namkug;Roh, Myung-Il
    • Journal of the Korea Society for Simulation
    • /
    • v.22 no.4
    • /
    • pp.109-118
    • /
    • 2013
  • This paper adopts the system configuration to assess the reliability instead of making a fault tree (FT), which is a traditional method to analyze reliability of a certain system; this is the reliability block diagram (RBD) method. The RBD method is a graphical presentation of a system diagram connecting the subsystems of components according to their functions or reliability relationships. The equipment model for the reliability simulation is modeled based on the discrete event system specification (DEVS) formalism. In order to make various alternatives of target system, this paper also adopts the system entity structure (SES), an ontological framework that hierarchically represents the elements of a system and their relationships. To enhance the calculation time of reliability analysis, GPU-based accelerations are adopted to the reliability simulation.

Bit Operation Optimization and DNN Application using GPU Acceleration (GPU 가속기를 통한 비트 연산 최적화 및 DNN 응용)

  • Kim, Sang Hyeok;Lee, Jae Heung
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1314-1320
    • /
    • 2019
  • In this paper, we propose a new method for optimizing bit operations and applying them to DNN(Deep Neural Network) in software environment. As a method for this, we propose a packing function for bitwise optimization and a masking matrix multiplication operation for application to DNN. The packing function converts 32-bit real value to 2-bit quantization value through threshold comparison operation. When this sequence is over, four 32-bit real values are changed to one 8-bit value. The masking matrix multiplication operation consists of a special operation for multiplying the packed weight value with the normal input value. And each operation was then processed in parallel using a GPU accelerator. As a result of this experiment, memory saved about 16 times than 32-bit DNN Model. Nevertheless, the accuracy was within 1%, similar to the 32-bit model.

CUDA-based Object Oriented Programming Techniques for Efficient Parallel Visualization of 3D Content (3차원 콘텐츠의 효율적인 병렬 시각화를 위한 CUDA 환경 기반 객체 지향 프로그래밍 기법)

  • Park, Tae-Jung
    • Journal of Digital Contents Society
    • /
    • v.13 no.2
    • /
    • pp.169-176
    • /
    • 2012
  • This paper presents a parallel object-oriented programming (OOP) platform for efficient visualization of three-dimensional content in CUDA environments. For this purpose, this paper discusses the features and limitations in implementing C++ object-oriented codes using CUDA and proposes the solutions. Also, it presents how to implement a 3D parallel visualization platform based on the MVC (Model/View/Controller) design pattern. Also, it provides sample implementations for integral MLS (iMLS) and signed distance fields (SDFs) based on the Marching Cubes and Raytracing. The proposed approach enables GPU parallel processing only by implementing simple interfaces. Based on this, developers can expect general benefits that are common in general OOP techniques including abstractization and inheritance. Though I implemented only two specific samples in this paper, I expect my approach can be widely applied to general computer graphics problems.