Search | Korea Science

A Study of the Performance Prediction Models of Mobile Graphics Processing Units

Kim, Cheong Ghil
- Journal of the Semiconductor & Display Technology
- /
- v.18 no.1
- /
- pp.123-128
- /
- 2019
Currently mobile services are on the verge of full commercialization ahead of 5G mobile communication (5G). The first goal could be to preempt the 5G market through realistic media services utilizing VR (Virtual Reality) and AR (Augmented Reality) technologies that users can most easily experience. Basically this movement is based on the advanced development of smart devices and high quality graphics processing computing power of mobile application processors. Accordingly, the importance of mobile GPUs is emerging and the most concern issue becomes a model for predicting the power and performance for smooth operation of high quality mobile contents. In many cases, the performance of mobile GPUs has been introduced in terms of power consumption of mobile GPUs using dynamic voltage and frequency scaling and throttling functions for power consumption and heat management. This paper introduces several studies of mobile GPU performance prediction model with user-friendly methods not like conventional power centric performance prediction models.
PDF KSCI

Parallel Range Query processing on R-tree with Graphics Processing Units (GPU를 이용한 R-tree에서의 범위 질의의 병렬 처리)

Yu, Bo-Seon;Kim, Hyun-Duk;Choi, Won-Ik;Kwon, Dong-Seop
- Journal of Korea Multimedia Society
- /
- v.14 no.5
- /
- pp.669-680
- /
- 2011
R-trees are widely used in various areas such as geographical information systems, CAD systems and spatial databases in order to efficiently index multi-dimensional data. As data sets used in these areas grow in size and complexity, however, range query operations on R-tree are needed to be further faster to meet the area-specific constraints. To address this problem, there have been various research efforts to develop strategies for acceleration query processing on R-tree by using the buffer mechanism or parallelizing the query processing on R-tree through multiple disks and processors. As a part of the strategies, approaches which parallelize query processing on R-tree through Graphics Processor Units(GPUs) have been explored. The use of GPUs may guarantee improved performances resulting from faster calculations and reduced disk accesses but may cause additional overhead costs caused by high memory access latencies and low data exchange rate between GPUs and the CPU. In this paper, to address the overhead problems and to adapt GPUs efficiently, we propose a novel approach which uses a GPU as a buffer to parallelize query processing on R-tree. The use of buffer algorithm can give improved performance by reducing the number of disk access and maximizing coalesced memory access resulting in minimizing GPU memory access latencies. Through the extensive performance studies, we observed that the proposed approach achieved up to 5 times higher query performance than the original CPU-based R-trees.
https://doi.org/10.9717/kmms.2011.14.5.669 인용 PDF KSCI

Accelerating Numerical Analysis of Reynolds Equation Using Graphic Processing Units (그래픽처리장치를 이용한 레이놀즈 방정식의 수치 해석 가속화)

Myung, Hun-Joo;Kang, Ji-Hoon;Oh, Kwang-Jin
- Tribology and Lubricants
- /
- v.28 no.4
- /
- pp.160-166
- /
- 2012
This paper presents a Reynolds equation solver for hydrostatic gas bearings, implemented to run on graphics processing units (GPUs). The original analysis code for the central processing unit (CPU) was modified for the GPU by using the compute unified device architecture (CUDA). The red-black Gauss-Seidel (RBGS) algorithm was employed instead of the original Gauss-Seidel algorithm for the iterative pressure solver, because the latter has data dependency between neighboring nodes. The implemented GPU program was tested on the nVidia GTX580 system and compared to the original CPU program on the AMD Llano system. In the iterative pressure calculation, the implemented GPU program showed 20-100 times faster performance than the original CPU codes. Comparison of the wall-clock times including all of pre/post processing codes showed that the GPU codes still delivered 4-12 times faster performance than the CPU code for our target problem.
https://doi.org/10.9725/kstle.2012.28.4.160 인용 PDF KSCI

Implementation of handwritten digit recognition CNN structure using GPGPU and Combined Layer (GPGPU와 Combined Layer를 이용한 필기체 숫자인식 CNN구조 구현)

Lee, Sangil;Nam, Kihun;Jung, Jun Mo
- The Journal of the Convergence on Culture Technology
- /
- v.3 no.4
- /
- pp.165-169
- /
- 2017
CNN(Convolutional Nerual Network) is one of the algorithms that show superior performance in image recognition and classification among machine learning algorithms. CNN is simple, but it has a large amount of computation and it takes a lot of time. Consequently, in this paper we performed an parallel processing unit for the convolution layer, pooling layer and the fully connected layer, which consumes a lot of handling time in the process of CNN, through the SIMT(Single Instruction Multiple Thread)'s structure of GPGPU(General-Purpose computing on Graphics Processing Units).And we also expect to improve performance by reducing the number of memory accesses and directly using the output of convolution layer not storing it in pooling layer. In this paper, we use MNIST dataset to verify this experiment and confirm that the proposed CNN structure is 12.38% better than existing structure.
https://doi.org/10.17703/JCCT.2017.3.4.165 인용 PDF KSCI

Real-World Physical Length Comparison in Virtual Environments (가상환경에서의 실세계 물리적 길이 비교)

Jung, Chul-Hee;Im, Chang-Hyuck;Lee, Min-Geun;Lee, Myeong-Won
- Journal of the Korea Computer Graphics Society
- /
- v.13 no.3
- /
- pp.19-24
- /
- 2007
In this paper, we describe a method of defining an object's real length in order to compare objects' lengths precisely using all real length units in the real world. The browser in our study represents an object's length by referencing to the physical length property defined at modeling when it displays the object. Since objects' lengths are appropriately scaled according to these units, objects can be precisely and visually compared in sire using real world length units. The concept of defining the real length unit is extended to the X3D specification. The units are ranged from $10^{-24}(yotta)\;to\;10^{24}(yocto)$. In addition, we explain the method for processing LOD (Levels Of Detail) and for applying the property of LOLD (Levels of Length Detail) when objects with different LOLD are read into the browser.
PDF

Computer Generated Hologram : Recoding and Reconstruction (컴퓨터 홀로그램의 생성 및 복원)

Yang, Yun-Mo;Oh, Byung Tae
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2014.11a
- /
- pp.261-263
- /
- 2014
최근 영화 <아바타>를 필두로 영화, 방송 등 영상매체에서 다양하게 3 차원 영상기술을 적용하고 있는 추세이다. 본 논문에서는 여러 가지 3 차원 영상 기술 중에서 가장 현실감이 높은 기술인 홀로그래피 (Holography)기술에 대하여 다루고자 한다. 우선 간략하게 홀로그래피 기술에 대하여 소개하고 홀로그램(Hologram)의 기록 및 복원 원리와 컴퓨터를 이용하여 홀로그래피 이미지를 만드는 컴퓨터 홀로그램 (Computer-generated hologram)에 대하여 기술하였으며, 범용 컴퓨터와 GPU(Graphics processing units)통해 컴퓨터 홀로그램 패턴을 기록 및 복원하는 실험을 진행해 보고, 시간 복잡도를 측정, 비교해 본다.
PDF

Development of Control Algorithm and Real Time Numerical Simulation Program for Adaptive Cruise Control Vehicles (적응순향 제어(ACC) 차량의 제어 알고리즘 및 실시간 수치실험 프로그램 개발)

원문철;강연준;강병배
- Transactions of the Korean Society of Automotive Engineers
- /
- v.7 no.7
- /
- pp.202-213
- /
- 1999
Adaptive Cruise Control (ACC) is one of key features on intelligent Transportation System(ITS). In ACC, the steering is done by a driver, but the engine throttle valve and the brake are controlled electronically. The relative velocity and distance from the preceeding vehicle are measured by radars or image processing units and relevant vehicular spacing is maintained in ACC control systems. In this study, vehicle longitudinal dynamics are modeled to simulate vehicle longitudinal maneuver and to design longtitudinal controllers for ACC vehicles. The control algorithm is designed based on the modeled vehicle longitudinal dynamics using a non-linear sliding mode control method. To verity the performance of the control algorithm, a real time numerical simulation program is developed on a Silicon Graphics workstation using C-language . A real time graphic program is alos develpe and integrated with the numerical simulation program.
PDF

GPU-based Monte Carlo Photon Migration Algorithm with Path-partition Load Balancing

Jeon, Youngjin;Park, Jongha;Hahn, Joonku;Kim, Hwi
- Current Optics and Photonics
- /
- v.5 no.6
- /
- pp.617-626
- /
- 2021
A parallel Monte Carlo photon migration algorithm for graphics processing units that implements an improved load-balancing strategy is presented. Conventional parallel Monte Carlo photon migration algorithms suffer from a computational bottleneck due to their reliance on a simple load-balancing strategy that does not take into account the different length of the mean free paths of the photons. In this paper, path-partition load balancing is proposed to eliminate this computational bottleneck based on a mathematical formula that parallelizes the photon path tracing process, which has previously been considered non-parallelizable. The performance of the proposed algorithm is tested using three-dimensional photon migration simulations of a human skin model.
https://doi.org/10.3807/COPP.2021.5.6.617 인용 PDF KSCI

BCDR algorithm for network estimation based on pseudo-likelihood with parallelization using GPU (유사가능도 기반의 네트워크 추정 모형에 대한 GPU 병렬화 BCDR 알고리즘)

Kim, Byungsoo;Yu, Donghyeon
- Journal of the Korean Data and Information Science Society
- /
- v.27 no.2
- /
- pp.381-394
- /
- 2016
Graphical model represents conditional dependencies between variables as a graph with nodes and edges. It is widely used in various fields including physics, economics, and biology to describe complex association. Conditional dependencies can be estimated from a inverse covariance matrix, where zero off-diagonal elements denote conditional independence of corresponding variables. This paper proposes a efficient BCDR (block coordinate descent with random permutation) algorithm using graphics processing units and random permutation for the CONCORD (convex correlation selection method) based on the BCD (block coordinate descent) algorithm, which estimates a inverse covariance matrix based on pseudo-likelihood. We conduct numerical studies for two network structures to demonstrate the efficiency of the proposed algorithm for the CONCORD in terms of computation times.
https://doi.org/10.7465/jkdi.2016.27.2.381 인용 PDF KSCI

Trends in AI Processor Technology (인공지능프로세서 기술 동향)

Lee, M.Y.;Chung, J.;Lee, J.H.;Han, J.H.;Kwon, Y.S.
- Electronics and Telecommunications Trends
- /
- v.35 no.3
- /
- pp.66-75
- /
- 2020
As the increasing expectations of a practical AI (Artificial Intelligence) service makes AI algorithms more complicated, an efficient processor to process AI algorithms is required. To meet this requirement, processors optimized for parallel processing, such as GPUs (Graphics Processing Units), have been widely employed. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted. This paper briefly introduces an AI processor especially for inference acceleration, developed by the Electronics and Telecommunications Research Institute, South Korea., and other global vendors for mobile and server platforms. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted.
https://doi.org/10.22648/ETRI.2020.J.350307 인용 PDF

Search Result 85, Processing Time 0.034 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)