• Title/Summary/Keyword: 병렬 GPU

Search Result 315, Processing Time 0.028 seconds

A Study on Efficient User Management System of Combat System

  • Hee-Soo Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.7
    • /
    • pp.191-198
    • /
    • 2024
  • In this paper, we proposes a user management system for efficient operation of the combat system within naval ship. Recently, naval ships have seen performance enhancements through various sensors, features, and continuous system development. This progress in the system has led to an increase in multi-funstion consoles that can manipulate various sensors and features within naval ship, consequently increasing the number of operators for these consoles. Therefore, a user management system that can control and manage multi-function consoles and operators in real-time is necessary for efficient management within naval ship. This paper suggests a user management system that can effectively manage the real-time situation of users accessing multi-function consoles. Additionally, a parallelization method using GPUs to reduce the CPU workload in operating various functions of the combat system is proposed. The proposed user management system has shown a performance improvement where the response time decreased by approximately 82% and the occupancy reduced by approximately 20% compared to the method using CPUs.

Analysis tool for the diffusion model using GPU: SNUDM-G (GPU를 이용한 확산모형 분석 도구: SNUDM-G)

  • Lee, Dajung;Lee, Hyosun;Koh, Sungryong
    • Korean Journal of Cognitive Science
    • /
    • v.33 no.3
    • /
    • pp.155-168
    • /
    • 2022
  • In this paper, we introduce the SNUDM-G, a diffusion model analysis tool with improved computational speed. Although the diffusion model has been applied to explain various cognitive tasks, its use was limited due to computational difficulties. In particular, SNUDM(Koh et al., 2020), one of the diffusion model analysis tools, has a disadvantage in terms of processing speed because it sequentially generates 20,000 data when approximating the diffusion process. To overcome this limitation, we propose to use graphic processing units(GPU) in the process of approximating the diffusion process with a random walk process. Since 20,000 data can be generated in parallel using the graphic processing units, the estimation speed can be increased compared to generating data through sequential processing. As a result of analyzing the data of Experiment 1 by Ratcliff et al. (2004) and recovering the parameters with SNUDM-G using GPU and SNUDM using CPU, SNUDM-G estimated slightly higher values for certain parameters than SNUDM. However, in term of computational speed, SNUDM-G estimated the parameters much faster than SNUDM. This result shows that a more efficient diffusion model analysis for various cognitive tasks is possible using this tool and further suggests that the processing speed of various cognitive models can be improved by using graphic processing units in the future.

Implementation of IQ/IDCT in H.264/AVC Decoder Using GPGPU (GPGPU를 이용한 H.264/AVC 디코더)

  • Kim, Dong-Han;Lee, Kwang-Yeob
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.162-164
    • /
    • 2010
  • H.264/AVC(Advanced Video Coding) is a standard for video compression. H.264/AVC provides good video quality at substantially lower bit rates than previous standards. In this papers, we propose the efficient architecture of H.264/AVC decoder using GPGPU. GPGPU can process many of operation in parallel. IQ/IDCT is possible that parallel processing in H.264/AVC decoding algorithm.

  • PDF

Face Detection using Skin Color Information and Parallel Processing Method on Multi-Core (멀티코어에서 피부색상 정보와 병렬처리 방법을 이용한 얼굴 검출)

  • Kim, Hong-Hee;Lee, Jae-Heung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.219-222
    • /
    • 2012
  • 최근 얼굴검출에 관한 연구는 FPGA를 통한 H/W설계부터 DSP, GPU, ARM Core에 효율적인 S/W 설계까지 다양하게 연구되고 있다. 본 연구에서는 Multi-Core에 효과적인 얼굴검출 방법을 제안한다. 피부색을 통한 얼굴 후보를 추출하고 그 외의 배경 이미지는 삭제하여 연산처리를 빠르게 하였다. Viola-Jones가 제안한 얼굴검출 알고리즘을 POSIX Thread를 사용하여 병렬 처리하였고 그 성능을 단일 코어와 멀티코어에서 측정하였다. 단일 코어에서는 성능의 향상이 없었으나 멀티코어에서는 약 1.8배 속도가 향상되었고 검출 성공률은 기존과 동일하였다.

Exploration of Optimization Environment for CUDA-based Cholesky Decomposition (CUDA 기반 숄레스키 분해 성능 최적화 환경 탐색)

  • Junbeom Kang;Myungho Lee;Neungsoo Park
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.15-17
    • /
    • 2024
  • 최근 다양한 연구 분야에서는 CUDA 프레임워크를 이용하여 병렬 처리를 통해 연산 시간을 단축하는데 성공하고 있다. 이 중 숄레스키 분해는 양의 정부호 행렬을 하삼각행렬로 분해하는 과정에서 많은 행렬 곱셈이 요구되어 GPU 의 구조적 특징을 활용하면 상당한 가속화가 가능하다. 따라서 이 논문에서는 CUDA 코어에 연산을 할당할 때, 핵심 요소인 블록의 개수와 블록 당 쓰레드 개수를 조절할 수 있는 병렬 숄레스키 분해 연산 프로그램을 구현하였다. 서로 다른 세 종류의 행렬 크기에 대해 다양한 블록 수-쓰레드 수 환경을 설정하여 가속화 정도를 측정한 결과, 각 행렬 별 최적 환경에서 동일 그룹 내 최장 시간 대비, 1000x1000 행렬에서는 약 1.80 배, 2000x2000 행렬에서는 약 2.94 배의 추가적인 가속화를 달성하였다.

Redundant Parallel Hopfield Network Configurations: A New Approach to the Two-Dimensional Face Recognitions (병렬 다중 홉 필드 네트워크 구성으로 인한 2-차원적 얼굴인식 기법에 대한 새로운 제안)

  • Kim, Yong Taek;Deo, Kiatama
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.2
    • /
    • pp.63-68
    • /
    • 2018
  • Interests in face recognition area have been increasing due to diverse emerging applications. Face recognition algorithm from a two-dimensional source could be challenging in dealing with some circumstances such as face orientation, illuminance degree, face details such as with/without glasses and various expressions, like, smiling or crying. Hopfield Network capabilities have been used specially within the areas of recalling patterns, generalizations, familiarity recognitions and error corrections. Based on those abilities, a specific experimentation is conducted in this paper to apply the Redundant Parallel Hopfield Network on a face recognition problem. This new design has been experimentally confirmed and tested to be robust in any kind of practical situations.

Efficient Implementation of Convolutional Neural Network Using CUDA (CUDA를 이용한 Convolutional Neural Network의 효율적인 구현)

  • Ki, Cheol-Min;Cho, Tai-Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.6
    • /
    • pp.1143-1148
    • /
    • 2017
  • Currently, Artificial Intelligence and Deep Learning are rising as hot social issues, and these technologies are applied to various fields. A good method among the various algorithms in Artificial Intelligence is Convolutional Neural Networks. Convolutional Neural Network is a form that adds Convolution Layers to Multi Layer Neural Network. If you use Convolutional Neural Networks for small amount of data, or if the structure of layers is not complicated, you don't have to pay attention to speed. But the learning should take long time when the size of the learning data is large and the structure of layers is complicated. In these cases, GPU-based parallel processing is frequently needed. In this paper, we developed Convolutional Neural Networks using CUDA, and show that its learning is faster and more efficient than learning using some other frameworks or programs.

BCDR algorithm for network estimation based on pseudo-likelihood with parallelization using GPU (유사가능도 기반의 네트워크 추정 모형에 대한 GPU 병렬화 BCDR 알고리즘)

  • Kim, Byungsoo;Yu, Donghyeon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.381-394
    • /
    • 2016
  • Graphical model represents conditional dependencies between variables as a graph with nodes and edges. It is widely used in various fields including physics, economics, and biology to describe complex association. Conditional dependencies can be estimated from a inverse covariance matrix, where zero off-diagonal elements denote conditional independence of corresponding variables. This paper proposes a efficient BCDR (block coordinate descent with random permutation) algorithm using graphics processing units and random permutation for the CONCORD (convex correlation selection method) based on the BCD (block coordinate descent) algorithm, which estimates a inverse covariance matrix based on pseudo-likelihood. We conduct numerical studies for two network structures to demonstrate the efficiency of the proposed algorithm for the CONCORD in terms of computation times.

MPEG-I RVS Software Speed-up for Real-time Application (실시간 렌더링을 위한 MPEG-I RVS 가속화 기법)

  • Ahn, Heejune;Lee, Myeong-jin
    • Journal of Broadcast Engineering
    • /
    • v.25 no.5
    • /
    • pp.655-664
    • /
    • 2020
  • Free viewpoint image synthesis technology is one of the important technologies in the MPEG-I (Immersive) standard. RVS (Reference View Synthesizer) developed by MPEG-I and in use in MPEG group is a DIBR (Depth Information-Based Rendering) program that generates an image at a virtual (intermediate) viewpoint from multiple viewpoints' inputs. RVS uses the mesh surface method based on computer graphics, and outperforms the pixel-based ones by 2.5dB or more compared to the previous pixel method. Even though its OpenGL version provides 10 times speed up over the non OpenGL based one, it still shows a non-real-time processing speed, i.e., 0.75 fps on the two 2k resolution input images. In this paper, we analyze the internal of RVS implementation and modify its structure, achieving 34 times speed up, therefore, real-time performance (22-26 fps), through the 3 key improvements: 1) the reuse of OpenGL buffers and texture objects 2) the parallelization of file I/O and OpenGL execution 3) the parallelization of GPU shader program and buffer transfer.

CUDA Implementation for the Four-Russian Algorithm (4-러시안 알고리즘의 CUDA 구현)

  • Kim, Young Ho;Jeong, Ju-Hui;Kang, Dae Woong;Sim, Jeong Seop;Kim, Minho;Park, Soo-jun;Lim, Myungeun;Jung, Ho-Youl
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.261-264
    • /
    • 2012
  • 상수 크기의 알파벳 ${\Sigma}$에 대해 길이가 각각 m, n인 두 문자열 X와 Y의 편집거리는 X를 Y로 변환하기 위해 필요한 최소 편집연산의 수로 정의된다. 두 문자열의 편집거리는 잘 알려진 동적프로그래밍을 이용하여 O(mn) 시간과 공간에 계산할 수 있으며, 4-러시안 알고리즘을 이용해도 계산할 수 있다. 4-러시안 알고리즘은 블록 크기를 상수 t라 할 때, 전처리 단계에서 $O\((3{\mid}{\Sigma}{\mid})^{2t}t^2\)$ 시간과 $O\((3{\mid}{\Sigma}{\mid})^{2t}t^2\)$ 공간이 필요하며, 계산 단계에서 O(mn/t) 시간과 O(mn) 공간을 이용하여 편집거리를 계산하는 알고리즘이다. 본 논문에서는 4-러시안 알고리즘의 계산 단계를 CUDA를 이용하여 구현하고 실험을 통해 CPU 기반의 순차적인 수행시간과 GPU 기반의 병렬적인 수행시간의 비교결과를 제시한다. 본 논문의 병렬알고리즘은 m/t개의 쓰레드를 사용하여 O(m+n) 시간에 편집거리를 계산한다. GPU 기반의 알고리즘이 CPU 기반의 알고리즘 보다 t=1일 때 약 10배 빠르고, t=2일 때 약 3배 빠른 결과를 보였다.