• Title/Summary/Keyword: Parallel GPU

Search Result 284, Processing Time 0.03 seconds

An Optimized Iterative Semantic Compression Algorithm And Parallel Processing for Large Scale Data

  • Jin, Ran;Chen, Gang;Tung, Anthony K.H.;Shou, Lidan;Ooi, Beng Chin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2761-2781
    • /
    • 2018
  • With the continuous growth of data size and the use of compression technology, data reduction has great research value and practical significance. Aiming at the shortcomings of the existing semantic compression algorithm, this paper is based on the analysis of ItCompress algorithm, and designs a method of bidirectional order selection based on interval partitioning, which named An Optimized Iterative Semantic Compression Algorithm (Optimized ItCompress Algorithm). In order to further improve the speed of the algorithm, we propose a parallel optimization iterative semantic compression algorithm using GPU (POICAG) and an optimized iterative semantic compression algorithm using Spark (DOICAS). A lot of valid experiments are carried out on four kinds of datasets, which fully verified the efficiency of the proposed algorithm.

An Analysis of Existing Studies on Parallel and Distributed Processing of the Rete Algorithm (Rete 알고리즘의 병렬 및 분산 처리에 관한 기존 연구 분석)

  • Kim, Jaehoon
    • The Journal of Korean Institute of Information Technology
    • /
    • v.17 no.7
    • /
    • pp.31-45
    • /
    • 2019
  • The core technologies for intelligent services today are deep learning, that is neural networks, and parallel and distributed processing technologies such as GPU parallel computing and big data. However, for intelligent services and knowledge sharing services through globally shared ontologies in the future, there is a technology that is better than the neural networks for representing and reasoning knowledge. It is a knowledge representation of IF-THEN in RIF or SWRL, which is the standard rule language of the Semantic Web, and can be inferred efficiently using the rete algorithm. However, when the number of rules processed by the rete algorithm running on a single computer is 100,000, its performance becomes very poor with several tens of minutes, and there is an obvious limitation. Therefore, in this paper, we analyze the past and current studies on parallel and distributed processing of rete algorithm, and examine what aspects should be considered to implement an efficient rete algorithm.

Implementation of Neural Networks using GPU (GPU를 이용한 신경망 구현)

  • Oh Kyoung-su;Jung Keechul
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.735-742
    • /
    • 2004
  • We present a new use of common graphics hardware to perform a faster artificial neural network. And we examine the use of GPU enhances the time performance of the image processing system using neural network, In the case of parallel computation of multiple input sets, the vector-matrix products become matrix-matrix multiplications. As a result, we can fully utilize the parallelism of GPU. Sigmoid operation and bias term addition are also implemented using pixel shader on GPU. Our preliminary result shows a performance enhancement of about thirty times faster using ATI RADEON 9800 XT board.

GPU-based Stereo Matching Algorithm with the Strategy of Population-based Incremental Learning

  • Nie, Dong-Hu;Han, Kyu-Phil;Lee, Heng-Suk
    • Journal of Information Processing Systems
    • /
    • v.5 no.2
    • /
    • pp.105-116
    • /
    • 2009
  • To solve the general problems surrounding the application of genetic algorithms in stereo matching, two measures are proposed. Firstly, the strategy of simplified population-based incremental learning (PBIL) is adopted to reduce the problems with memory consumption and search inefficiency, and a scheme for controlling the distance of neighbors for disparity smoothness is inserted to obtain a wide-area consistency of disparities. In addition, an alternative version of the proposed algorithm, without the use of a probability vector, is also presented for simpler set-ups. Secondly, programmable graphics-hardware (GPU) consists of multiple multi-processors and has a powerful parallelism which can perform operations in parallel at low cost. Therefore, in order to decrease the running time further, a model of the proposed algorithm, which can be run on programmable graphics-hardware (GPU), is presented for the first time. The algorithms are implemented on the CPU as well as on the GPU and are evaluated by experiments. The experimental results show that the proposed algorithm offers better performance than traditional BMA methods with a deliberate relaxation and its modified version in terms of both running speed and stability. The comparison of computation times for the algorithm both on the GPU and the CPU shows that the former has more speed-up than the latter, the bigger the image size is.

Discolored Metal Pad Image Classification Based on Gabor Texture Features Using GPU (GPU를 이용한 Gabor Texture 특징점 기반의 금속 패드 변색 분류 알고리즘)

  • Cui, Xue-Nan;Park, Eun-Soo;Kim, Jun-Chul;Kim, Hak-Il
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.15 no.8
    • /
    • pp.778-785
    • /
    • 2009
  • This paper presents a Gabor texture feature extraction method for classification of discolored Metal pad images using GPU(Graphics Processing Unit). The proposed algorithm extracts the texture information using Gabor filters and constructs a pattern map using the extracted information. Finally, the golden pad images are classified by utilizing the feature vectors which are extracted from the constructed pattern map. In order to evaluate the performance of the Gabor texture feature extraction algorithm based on GPU, a sequential processing and parallel processing using OpenMP in CPU of this algorithm were adopted. Also, the proposed algorithm was implemented by using Global memory and Shared memory in GPU. The experimental results were demonstrated that the method using Shared memory in GPU provides the best performance. For evaluating the effectiveness of extracted Gabor texture features, an experimental validation has been conducted on a database of 20 Metal pad images and the experiment has shown no mis-classification.

A Study on High Speed Face Tracking using the GPGPU-based Depth Information (GPGPU 기반의 깊이 정보를 이용한 고속 얼굴 추적에 대한 연구)

  • Kim, Woo-Youl;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.5
    • /
    • pp.1119-1128
    • /
    • 2013
  • In this paper, we propose an algorithm to detect and track the human face with a GPU-based high speed. Basically the detection algorithm uses the existing Adaboost algorithm but the search area is dramatically reduced by detecting movement and skin color region. Differently from detection process, tracking algorithm uses only depth information. Basically it uses a template matching method such that it searches a matched block to the template. Also, In order to fast track the face, it was computed in parallel using GPU about the template matching. Experimental results show that the GPU speed when compared with the CPU has been increased to up to 49 times.

Performance Comparison of Particle Simulation Using GPU Between OpenGL and Unity (OpenGL과 Unity간의 GPU를 이용한 Particle Simulation의 성능 비교)

  • Kim, Min Sang;Sung, Nak-Jun;Choi, Yoo-Joo;Hong, Min
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.479-486
    • /
    • 2017
  • Recently, GPGPU has been able to increase the degradation of computer performance, and it is now possible to run physically based real-time simulations on PCs that require high computational complexity. Physical calculations applied in physics simulation can be performed by parallel processing, and can be efficiently performed using parallel computation using Compute shader recently supported by OpenGL 4.3 and Unity 4.0. In this paper, we measure and compare the number of performance in real - time physics simulation in OpenGL running on various platforms and Unity, a content creation tool supporting various platforms. Particle simulation experiments show that particle simulation using Unity performs faster than 136.04%. It is expected that it will be able to select better development tools for future multi - platform support.

Acceleration of ECC Computation for Robust Massive Data Reception under GPU-based Embedded Systems (GPU 기반 임베디드 시스템에서 대용량 데이터의 안정적 수신을 위한 ECC 연산의 가속화)

  • Kwon, Jisu;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.956-962
    • /
    • 2020
  • Recently, as the size of data used in an embedded system increases, the need for an ECC decoding operation to robustly receive a massive data is emphasized. In this paper, we propose a method to accelerate the execution of computations that derive syndrome vectors when ECC decoding is performed using Hamming code in an embedded system with a built-in GPU. The proposed acceleration method uses the matrix-vector multiplication of the decoding operation using the CSR format, one of the data structures representing sparse matrix, and is performed in parallel in the CUDA kernel of the GPU. We evaluated the proposed method using a target embedded board with a GPU, and the result shows that the execution time is reduced when ECC decoding operation accelerated based on the GPU than used only CPU.

Development of the sediment transport model using GPU arithmetic (GPU 연산을 활용한 유사이송 예측모형 개발)

  • Noh, Junsu;Son, Sangyoung
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.7
    • /
    • pp.431-438
    • /
    • 2023
  • Many shorelines are facing the beach erosion. Considering the climate change and the increment of coastal population, the erosion problem could be accelerated. To address this issue, developing a sediment transport model for rapidly predicting terrain change is crucial. In this study, a sediment transport model based on GPU parallel arithmetic was introduced, and it was supposed to simulate the terrain change well with a higher computing speed compared to the CPU based model. We also aim to investigate the model performance and the GPU computational efficiency. We applied several dam break cases to verified model, and we found that the simulated results were close to the observed results. The computational efficiency of GPU was defined by comparing operation time of CPU based model, and it showed that the GPU based model were more efficient than the CPU based model.

Analysis on the Active/Inactive Status of Computational Resources for Improving the Performance of the GPU (GPU 성능 저하 해결을 위한 내부 자원 활용/비활용 상태 분석)

  • Choi, Hongjun;Son, Dongoh;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.7
    • /
    • pp.1-11
    • /
    • 2015
  • In recent high performance computing system, GPGPU has been widely used to process general-purpose applications as well as graphics applications, since GPU can provide optimized computational resources for massive parallel processing. Unfortunately, GPGPU doesn't exploit computational resources on GPU in executing general-purpose applications fully, because the applications cannot be optimized to GPU architecture. Therefore, we provide GPU research guideline to improve the performance of computing systems using GPGPU. To accomplish this, we analyze the negative factors on GPU performance. In this paper, in order to clearly classify the cause of the negative factors on GPU performance, GPU core status are defined into 5 status: fully active status, partial active status, idle status, memory stall status and GPU core stall status. All status except fully active status cause performance degradation. We evaluate the ratio of each GPU core status depending on the characteristics of benchmarks to find specific reasons which degrade the performance of GPU. According to our simulation results, partial active status, idle status, memory stall status and GPU core stall status are induced by computational resource underutilization problem, low parallelism, high memory requests, and structural hazard, respectively.