• Title/Summary/Keyword: 그래픽 처리장치 병렬컴퓨팅

Search Result 10, Processing Time 0.026 seconds

Analysis on Memory Characteristics of Graphics Processing Units for Designing Memory System of General-Purpose Computing on Graphics Processing Units (범용 그래픽 처리 장치의 메모리 설계를 위한 그래픽 처리 장치의 메모리 특성 분석)

  • Choi, Hongjun;Kim, Cheolhong
    • Smart Media Journal
    • /
    • v.3 no.1
    • /
    • pp.33-38
    • /
    • 2014
  • Even though the performance of microprocessor is improved continuously, the performance improvement of computing system becomes hard to increase, in order to some drawbacks including increased power consumption. To solve the problem, general-purpose computing on graphics processing units(GPGPUs), which execute general-purpose applications by using specialized parallel-processing device representing graphics processing units(GPUs), have been focused. However, the characteristics of applications related with graphics is substantially different from the characteristics of general-purpose applications. Therefore, GPUs cannot exploit the outstanding computational resources sufficiently due to various constraints, when they execute general-purpose applications. When designing GPUs for GPGPU, memory system is important to effectively exploit the GPUs since typically general-purpose applications requires more memory accesses than graphics applications. Especially, external memory access requiring long latency impose a big overhead on the performance of GPUs. Therefore, the GPU performance must be improved if hierarchical memory architecture which can reduce the number of external memory access is applied. For this reason, we will investigate the analysis of GPU performance according to hierarchical cache architectures in executing various benchmarks.

Research of accelerating method of video quality measurement program using GPGPU (GPGPU를 이용한 영상 품질 측정 프로그램의 가속화 연구)

  • Lee, Seonguk;Byeon, Gibeom;Kim, Kisu;Hong, Jiman
    • Smart Media Journal
    • /
    • v.5 no.4
    • /
    • pp.69-74
    • /
    • 2016
  • Recently, parallel computing using GPGPU(General-Purpose computing on Graphics Processing Units) according to the development of the graphics processing unit is expanding. This can be achieved through the processing speeds faster than traditional computing environments across many fields, including science, medicine, engineering, and analysis. However, in using the GPU technology to implement the a parallel program there are many constraints. In this paper, we port a CPU-based program(Video Quality Measurement Program) to use technology. The program ported to GPU-based show about 1.83 times the execution speed than CPU-based program. We study on the acceleration of the GPU-based program. Also we discuss the technical constraints and problems that occur when you modify the CPU to the GPU-based programs.

Analyzing Fine-Grained Resource Utilization for Efficient GPU Workload Allocation (GPU 작업 배치의 효율화를 위한 자원 이용률 상세 분석)

  • Park, Yunjoo;Shin, Donghee;Cho, Kyungwoon;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.1
    • /
    • pp.111-116
    • /
    • 2019
  • Recently, GPU expands application domains from graphic processing to various kinds of parallel workloads. However, current GPU systems focus on the maximization of each workload's parallelism through simplified control rather than considering various workload characteristics. This paper classifies the resource usage characteristics of GPU workloads into computing-bound, memory-bound, and dependency-latency-bound, and quantifies the fine-grained bottleneck for efficient workload allocation. For example, we identify the exact bottleneck resources such as single function unit, double function unit, or special function unit even for the same computing-bound workloads. Our analysis implies that workloads can be allocated together if fine-grained bottleneck resources are different even for the same computing-bound workloads, which can eventually contribute to efficient workload allocation in GPU.

Introduction to general purpose GPU computing (GPU를 이용한 범용 계산의 소개)

  • Yu, Donghyeon;Lim, Johan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.1043-1061
    • /
    • 2013
  • Recent advances in computer technology introduce massive data and their analysis becomes important. The high performance computing is one of the most essential part in analysis of massive data. In this paper, we review the general purpose of the graphics processing unit and its application to parallel computing, which has been of great interest in statistics communities.

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

  • Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.275-284
    • /
    • 2010
  • GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.

Analysis of GPU Performance and Memory Efficiency according to Task Processing Units (작업 처리 단위 변화에 따른 GPU 성능과 메모리 접근 시간의 관계 분석)

  • Son, Dong Oh;Sim, Gyu Yeon;Kim, Cheol Hong
    • Smart Media Journal
    • /
    • v.4 no.4
    • /
    • pp.56-63
    • /
    • 2015
  • Modern GPU can execute mass parallel computation by exploiting many GPU core. GPGPU architecture, which is one of approaches exploiting outstanding computational resources on GPU, executes general-purpose applications as well as graphics applications, effectively. In this paper, we investigate the impact of memory-efficiency and performance according to number of CTAs(Cooperative Thread Array) on a SM(Streaming Multiprocessors), since the analysis of relation between number of CTA on a SM and them provides inspiration for researchers who study the GPU to improve the performance. Our simulation results show that almost benchmarks increasing the number of CTAs on a SM improve the performance. On the other hand, some benchmarks cannot provide performance improvement. This is because the number of CTAs generated from same kernel is a little or the number of CTAs executed simultaneously is not enough. To precisely classify the analysis of performance according to number of CTA on a SM, we also analyze the relations between performance and memory stall, dram stall due to the interconnect congestion, pipeline stall at the memory stage. We expect that our analysis results help the study to improve the parallelism and memory-efficiency on GPGPU architecture.

Analysis on the Active/Inactive Status of Computational Resources for Improving the Performance of the GPU (GPU 성능 저하 해결을 위한 내부 자원 활용/비활용 상태 분석)

  • Choi, Hongjun;Son, Dongoh;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.7
    • /
    • pp.1-11
    • /
    • 2015
  • In recent high performance computing system, GPGPU has been widely used to process general-purpose applications as well as graphics applications, since GPU can provide optimized computational resources for massive parallel processing. Unfortunately, GPGPU doesn't exploit computational resources on GPU in executing general-purpose applications fully, because the applications cannot be optimized to GPU architecture. Therefore, we provide GPU research guideline to improve the performance of computing systems using GPGPU. To accomplish this, we analyze the negative factors on GPU performance. In this paper, in order to clearly classify the cause of the negative factors on GPU performance, GPU core status are defined into 5 status: fully active status, partial active status, idle status, memory stall status and GPU core stall status. All status except fully active status cause performance degradation. We evaluate the ratio of each GPU core status depending on the characteristics of benchmarks to find specific reasons which degrade the performance of GPU. According to our simulation results, partial active status, idle status, memory stall status and GPU core stall status are induced by computational resource underutilization problem, low parallelism, high memory requests, and structural hazard, respectively.

Calculation Effect of GPU Parallel Programing for Planar Multibody System Dynamics (평면 다물체 동역학 해석에서 GPU 병렬 프로그래밍의 계산효과)

  • Jun, C.W.;Sohn, J.H.
    • Journal of Power System Engineering
    • /
    • v.16 no.4
    • /
    • pp.12-16
    • /
    • 2012
  • In this paper, the equations of motions for planar multibody dynamics are established for considering the parallel programming based on GPU. Cartesian coordinates are used to formulate the equations of motion and implicit integration method called HHT-alpha is employed. Open chain multibody system is considered for computer simulation. CUDA toolkit is employed for establishing the GPU parallel programming. The exactness of the analysis is verified from the comparison with ADAMS. The results from parallel computing based on GPU are compared with the results from the sequential programming based on CPU in terms of calculation time. The multiple pendulum with bodies and joints is employed for the computer simulation. In the pendulum system that has 290 bodies, the parallel program indicates an improved efficiency of about 25.5 second(15.5% improvement). It is noted that the larger the size of system is, the time efficiency is better.

Development of GPU-accelerated kinematic wave model using CUDA fortran (CUDA fortran을 이용한 GPU 가속 운동파모형 개발)

  • Kim, Boram;Park, Seonryang;Kim, Dae-Hong
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.11
    • /
    • pp.887-894
    • /
    • 2019
  • We proposed a GPU (Grapic Processing Unit) accelerated kinematic wave model for rainfall runoff simulation and tested the accuracy and speed up performance of the proposed model. The governing equations are the kinematic wave equation for surface flow and the Green-Ampt model for infiltration. The kinematic wave equations were discretized using a finite volume method and CUDA fortran was used to implement the rainfall runoff model. Several numerical tests were conducted. The computed results of the GPU accelerated kinematic wave model were compared with several measured and other numerical results and reasonable agreements were observed from the comparisons. The speed up performance of the GPU accelerated model increased as the number of grids increased, achieving a maximum speed up of approximately 450 times compared to a CPU (Central Processing Unit) version, at least for the tested computing resources.

A new warp scheduling technique for improving the performance of GPUs by utilizing MSHR information (GPU 성능 향상을 위한 MSHR 정보 기반 워프 스케줄링 기법)

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.3
    • /
    • pp.72-83
    • /
    • 2017
  • GPUs can provide high throughput with latency hiding by executing many warps in parallel. MSHR(Miss Status Holding Registers) for L1 data cache tracks cache miss requests until required data is serviced from lower level memory. In recent GPUs, excessive requests for cache resources cause underutilization problem of GPU resources due to cache resource reservation fails. In this paper, we propose a new warp scheduling technique to reduce stall cycles under MSHR resource shortage. Cache miss rates for each warp is predicted based on the observation that each warp shows similar cache miss rates for long period. The warps showing low miss rates or computation-intensive warps are given high priority to be issued when MSHR is full status. Our proposal improves GPU performance by utilizing cache resource more efficiently based on cache miss rate prediction and monitoring the MSHR entries. According to our experimental results, reservation fail cycles can be reduced by 25.7% and IPC is increased by 6.2% with the proposed scheduling technique compared to loose round robin scheduler.