• Title/Summary/Keyword: GPGPU computing

Search Result 87, Processing Time 0.035 seconds

Performance Improvement in Observation Probability Computation of Gaussian Mixture Models Using GPGPU (GPGPU를 이용한 가우시안 혼합 모델의 관측확률 계산 성능 향상)

  • Kim, Hyeong-Ju;Kim, Seung-Hi;Kim, Sanghun;Jang, Gil-Jin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.148-151
    • /
    • 2012
  • 범용 GPU (general-purpose computing on graphics processing units, GPGPU)는 GPU를 일반적인 목적으로 사용하고자 하는 병렬 컴퓨터 구조로써, 과학 연산 등 여러 분야에서 응용 프로그램의 성능을 향상시키기 위하여 사용되고 있다. 본 연구에서는 음성인식기에서 주로 사용되는 가우시안 혼합 모델(Gaussian mixture model, GMM)에서 많은 연산시간을 차지하는 관측확률 계산의 성능을 향상시키고자 GPGPU를 이용하는 알고리즘을 구현하였으며, 기존 CPU 기반 알고리즘 대비 약 13배 연산시간을 단축하였다.

Performance Improvement of Web Service Based on GPGPU and Task Queue

  • Kim, Changsu;Kim, Kyunghwan;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.4
    • /
    • pp.257-262
    • /
    • 2021
  • Providing web services to users has become expensive in recent times. For better web services, a web server is provided with high-performance technology. To achieve great web service experiences, tools such as general-purpose graphics processing units (GPGPUs), artificial intelligence, high-performance computing, and three-dimensional simulation are widely used. However, graphics processing units (GPUs) are used in high-speed operations and have limited general applications. In this study, we developed a task queue in a GPU to improve the performance of a web service using a multiprocessor and studied how to receive and process user requests in bulk. We propose the use of a GPGPU-based task queue to process user requests more than GPGPU based a central processing unit thread, and to process more GPU threads on task queue at about 136% to 233%, and proved that the proposed method is effective for web service.

Implementation of handwritten digit recognition CNN structure using GPGPU and Combined Layer (GPGPU와 Combined Layer를 이용한 필기체 숫자인식 CNN구조 구현)

  • Lee, Sangil;Nam, Kihun;Jung, Jun Mo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.3 no.4
    • /
    • pp.165-169
    • /
    • 2017
  • CNN(Convolutional Nerual Network) is one of the algorithms that show superior performance in image recognition and classification among machine learning algorithms. CNN is simple, but it has a large amount of computation and it takes a lot of time. Consequently, in this paper we performed an parallel processing unit for the convolution layer, pooling layer and the fully connected layer, which consumes a lot of handling time in the process of CNN, through the SIMT(Single Instruction Multiple Thread)'s structure of GPGPU(General-Purpose computing on Graphics Processing Units).And we also expect to improve performance by reducing the number of memory accesses and directly using the output of convolution layer not storing it in pooling layer. In this paper, we use MNIST dataset to verify this experiment and confirm that the proposed CNN structure is 12.38% better than existing structure.

A Method for Group Mobility Model Construction and Model Representation from Positioning Data Set Using GPGPU (GPGPU에 기반하는 위치 정보 집합에서 집단 이동성 모델의 도출 기법과 그 표현 기법)

  • Song, Ha Yoon;Kim, Dong Yup
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.3
    • /
    • pp.141-148
    • /
    • 2017
  • The current advancement of mobile devices enables users to collect a sequence of user positions by use of the positioning technology and thus the related research regarding positioning or location information are quite arising. An individual mobility model based on positioning data and time data are already established while group mobility model is not done yet. In this research, group mobility model, an extension of individual mobility model, and the process of establishment of group mobility model will be studied. Based on the previous research of group mobility model from two individual mobility model, a group mobility model with more than two individual model has been established and the transition pattern of the model is represented by Markov chain. In consideration of real application, the computing time to establish group mobility mode from huge positioning data has been drastically improved by use of GPGPU comparing to the use of traditional multicore systems.

A Study of How to Improve Execution Speed of Grabcut Using GPGPU (GPGPU를 이용한 Grabcut의 수행 속도 개선 방법에 관한 연구)

  • Kim, Ji-Hoon;Park, Young-Soo;Lee, Sang-Hun
    • Journal of Digital Convergence
    • /
    • v.12 no.11
    • /
    • pp.379-386
    • /
    • 2014
  • In this paper, the processing speed of Grabcut algorithm in order to efficiently improve the GPU (Graphics Processing Unit) for processing the data from the method. Grabcut algorithm has excellent performance object detection algorithm. Grabcut existing algorithms to split the foreground area and the background area, and then background and foreground K-cluster is assigned a cluster. And assigned to gradually improve the results, until the process is repeated. But Drawback of Grabcut algorithm is the time consumption caused by the repetition of clustering. Thus GPGPU (General-Purpose computing on Graphics Processing Unit) using the repeated operations in parallel by processing Grabcut algorithm to effectively improve the processing speed of the method. We proposed method of execution time of the algorithm reduced the average of about 95.58%.

Efficient Thread Allocation Method of Convolutional Neural Network based on GPGPU (GPGPU 기반 Convolutional Neural Network의 효율적인 스레드 할당 기법)

  • Kim, Mincheol;Lee, Kwangyeob
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.10
    • /
    • pp.935-943
    • /
    • 2017
  • CNN (Convolution neural network), which is used for image classification and speech recognition among neural networks learning based on positive data, has been continuously developed to have a high performance structure to date. There are many difficulties to utilize in an embedded system with limited resources. Therefore, we use GPU (General-Purpose Computing on Graphics Processing Units), which is used for general-purpose operation of GPU to solve the problem because we use pre-learned weights but there are still limitations. Since CNN performs simple and iterative operations, the computation speed varies greatly depending on the thread allocation and utilization method in the Single Instruction Multiple Thread (SIMT) based GPGPU. To solve this problem, there is a thread that needs to be relaxed when performing Convolution and Pooling operations with threads. The remaining threads have increased the operation speed by using the method used in the following feature maps and kernel calculations.

Scalable Ontology Reasoning Using GPU Cluster Approach (GPU 클러스터 기반 대용량 온톨로지 추론)

  • Hong, JinYung;Jeon, MyungJoong;Park, YoungTack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.61-70
    • /
    • 2016
  • In recent years, there has been a need for techniques for large-scale ontology inference in order to infer new knowledge from existing knowledge at a high speed, and for a diversity of semantic services. With the recent advances in distributed computing, developments of ontology inference engines have mostly been studied based on Hadoop or Spark frameworks on large clusters. Parallel programming techniques using GPGPU, which utilizes many cores when compared with CPU, is also used for ontology inference. In this paper, by combining the advantages of both techniques, we propose a new method for reasoning large RDFS ontology data using a Spark in-memory framework and inferencing distributed data at a high speed using GPGPU. Using GPGPU, ontology reasoning over high-capacity data can be performed as a low cost with higher efficiency over conventional inference methods. In addition, we show that GPGPU can reduce the data workload on each node through the Spark cluster. In order to evaluate our approach, we used LUBM ranging from 10 to 120. Our experimental results showed that our proposed reasoning engine performs 7 times faster than a conventional approach which uses a Spark in-memory inference engine.

Analysis on the Active/Inactive Status of Computational Resources for Improving the Performance of the GPU (GPU 성능 저하 해결을 위한 내부 자원 활용/비활용 상태 분석)

  • Choi, Hongjun;Son, Dongoh;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.7
    • /
    • pp.1-11
    • /
    • 2015
  • In recent high performance computing system, GPGPU has been widely used to process general-purpose applications as well as graphics applications, since GPU can provide optimized computational resources for massive parallel processing. Unfortunately, GPGPU doesn't exploit computational resources on GPU in executing general-purpose applications fully, because the applications cannot be optimized to GPU architecture. Therefore, we provide GPU research guideline to improve the performance of computing systems using GPGPU. To accomplish this, we analyze the negative factors on GPU performance. In this paper, in order to clearly classify the cause of the negative factors on GPU performance, GPU core status are defined into 5 status: fully active status, partial active status, idle status, memory stall status and GPU core stall status. All status except fully active status cause performance degradation. We evaluate the ratio of each GPU core status depending on the characteristics of benchmarks to find specific reasons which degrade the performance of GPU. According to our simulation results, partial active status, idle status, memory stall status and GPU core stall status are induced by computational resource underutilization problem, low parallelism, high memory requests, and structural hazard, respectively.

Algorithmic GPGPU Memory Optimization

  • Jang, Byunghyun;Choi, Minsu;Kim, Kyung Ki
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.4
    • /
    • pp.391-406
    • /
    • 2014
  • The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access-pattern-aware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPGPU workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.

Performance Enhancement of Scaling Filter and Transcoder using CUDA (CUDA를 활용한 스케일링 필터 및 트랜스코더의 성능향상)

  • Han, Jae-Geun;Ko, Young-Sub;Suh, Sung-Han;Ha, Soon-Hoi
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.507-511
    • /
    • 2010
  • In this paper, we propose to enhance the performance of software transcoder by using GPGPU for scaling filters. Video transcoding is a technique that translates a video file to another video file that has a different coding algorithm and/or a different frame size. Its demand increases as more multimedia devices with different specification coexist in our daily life. Since transcoding is computationally intensive, a software transcoder that runs on a CPU takes long processing time. In this paper, we achieve significant speed-up by parallelizing the scaling filter using a GPGPU that can provide significantly large computation power. Through extensive experiments with various video scripts of different size and with various scaling filter options, it is verified that the enhanced transcoder could achieve 36% performance improvement in the default option, and up to 101% in a certain option.