• Title/Summary/Keyword: SIMT

Search Result 24, Processing Time 0.023 seconds

Efficient Thread Allocation Method of Convolutional Neural Network based on GPGPU (GPGPU 기반 Convolutional Neural Network의 효율적인 스레드 할당 기법)

  • Kim, Mincheol;Lee, Kwangyeob
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.10
    • /
    • pp.935-943
    • /
    • 2017
  • CNN (Convolution neural network), which is used for image classification and speech recognition among neural networks learning based on positive data, has been continuously developed to have a high performance structure to date. There are many difficulties to utilize in an embedded system with limited resources. Therefore, we use GPU (General-Purpose Computing on Graphics Processing Units), which is used for general-purpose operation of GPU to solve the problem because we use pre-learned weights but there are still limitations. Since CNN performs simple and iterative operations, the computation speed varies greatly depending on the thread allocation and utilization method in the Single Instruction Multiple Thread (SIMT) based GPGPU. To solve this problem, there is a thread that needs to be relaxed when performing Convolution and Pooling operations with threads. The remaining threads have increased the operation speed by using the method used in the following feature maps and kernel calculations.

Smart Information Monitoring Technology (스마트 정보 모니터링 기술)

  • Kang, Man-Mo;Lee, Dong-Hyung;Koo, Ja-Rok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.6
    • /
    • pp.225-233
    • /
    • 2010
  • Recently, in the field of Smart Grid, Smart Home Network, Ubiquitous Computing, etc. we have continued to study Smart Information Monitoring Technology(SIMT) which exchange, control and monitor information collected and processed by need in real-time and two-way. In this paper, we understand application products or recent trends of SIMT for Energy, U-Farm, Vehicle Information and Home Network. Specially, we explain Google PowerMeter which exchange information with Smart Meter of core part of the smart grid at real-time, Real-time Monitoring System(RMS) for U-Farm, RMS for vehicle status Information. we subscribe Smart Information Monitoring Technology application based on ZigBee of low price, low power or related work. Finally we subscribe actual proof construction situation of Jesu for smart grid.

An Implementation of a Convolutional Accelerator based on a GPGPU for a Deep Learning (Deep Learning을 위한 GPGPU 기반 Convolution 가속기 구현)

  • Jeon, Hee-Kyeong;Lee, Kwang-yeob;Kim, Chi-yong
    • Journal of IKEEE
    • /
    • v.20 no.3
    • /
    • pp.303-306
    • /
    • 2016
  • In this paper, we propose a method to accelerate convolutional neural network by utilizing a GPGPU. Convolutional neural network is a sort of the neural network learning features of images. Convolutional neural network is suitable for the image processing required to learn a lot of data such as images. The convolutional layer of the conventional CNN required a large number of multiplications and it is difficult to operate in the real-time on the embedded environment. In this paper, we reduce the number of multiplications through Winograd convolution operation and perform parallel processing of the convolution by utilizing SIMT-based GPGPU. The experiment was conducted using ModelSim and TestDrive, and the experimental results showed that the processing time was improved by about 17%, compared to the conventional convolution.

Design of New Spatio-temporal Representation Scheme for Moving Objects in Video (비디오의 움직임 객체를 위한 새로운 시공간 표현 기법의 설계)

  • 심춘보;김남기;장재우
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.04b
    • /
    • pp.110-112
    • /
    • 2000
  • 이미지와는 달리, 비디오 데이터는 객체에 대한 움직임 정보(motion trajectory)를 가지고 있으며, 이러한 움직임 정보는 비디오 데이터만이 가지는 매우 중요한 특징으로 비디오 데이터에 대한 색인과 내용 기반 검색을 수행하는 데 있어 중요한 역할을 한다. 따라서, 본 논문에서는 비디오 데이터베이스에서 효율적인 내용기반 검색을 위해 하나의 객체에 대한 움직임 정보를 나타내는 single motion trajectory와 두 객체에 대한 움직임 정보를 나타내는 multiple motion trajectory를 위한 새로운 시공간 표현 기법을 제안한다. 아울러, 움직임 정보에 대한 사용자 질의에 대해 유사성을 측정하여 순위부여와 Time Interval을 지원하는 새로운 유사성 측정 알고리즘인 SIST와 SIMT를 제안한다.

  • PDF

Implementation of handwritten digit recognition CNN structure using GPGPU and Combined Layer (GPGPU와 Combined Layer를 이용한 필기체 숫자인식 CNN구조 구현)

  • Lee, Sangil;Nam, Kihun;Jung, Jun Mo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.3 no.4
    • /
    • pp.165-169
    • /
    • 2017
  • CNN(Convolutional Nerual Network) is one of the algorithms that show superior performance in image recognition and classification among machine learning algorithms. CNN is simple, but it has a large amount of computation and it takes a lot of time. Consequently, in this paper we performed an parallel processing unit for the convolution layer, pooling layer and the fully connected layer, which consumes a lot of handling time in the process of CNN, through the SIMT(Single Instruction Multiple Thread)'s structure of GPGPU(General-Purpose computing on Graphics Processing Units).And we also expect to improve performance by reducing the number of memory accesses and directly using the output of convolution layer not storing it in pooling layer. In this paper, we use MNIST dataset to verify this experiment and confirm that the proposed CNN structure is 12.38% better than existing structure.

Random Partial Haar Wavelet Transformation for Single Instruction Multiple Threads (단일 명령 다중 스레드 병렬 플랫폼을 위한 무작위 부분적 Haar 웨이블릿 변환)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.16 no.5
    • /
    • pp.805-813
    • /
    • 2015
  • Many researchers expect the compressive sensing and sparse recovery problem can overcome the limitation of conventional digital techniques. However, these new approaches require to solve the l1 norm optimization problems when it comes to signal reconstruction. In the signal reconstruction process, the transform computation by multiplication of a random matrix and a vector consumes considerable computing power. To address this issue, parallel processing is applied to the optimization problems. In particular, due to huge size of original signal, it is hard to store the random matrix directly in memory, which makes one need to design a procedural approach in handling the random matrix. This paper presents a new parallel algorithm to calculate random partial Haar wavelet transform based on Single Instruction Multiple Threads (SIMT) platform.

Change of Compressive Deformation Behaviors of Ti-5Mo-xFe Metastable Beta Alloy According to Fe Contents (Fe 함량에 따른 Ti-5Mo-xFe 준안정 베타 합금의 압축 변형거동 변화)

  • Yong-Jae Lee;Jae Gwan Lee;Dong-Geun Lee
    • Journal of the Korean Society for Heat Treatment
    • /
    • v.36 no.5
    • /
    • pp.303-310
    • /
    • 2023
  • β titanium alloys are widely used in aerospace industry due to their excellent specific strength and corrosion resistance. In particular, mechanical properties of metastable β titanium can efficiently be controlled by various deformation mechanisms such as slip, twinning, and SIM (Stress-Induced Martensite Transformation), making it an ideal material for many industrial applications. In this study, Ti-5Mo-xFe (x=1, 2, 4 wt%) alloy was designed by adding a relatively inexpensive β element to ensure price competitiveness. Additionally, microstructural analysis was conducted using OM, SEM, and XRD, while mechanical properties were evaluated through hardness and compression tests to consider the deformation mechanisms based on the Fe content. SIMT occurred in all three alloys and was influenced by the presence of βm (metastable beta) and beta stability. As the Fe content decreased, the α'' phase increased due to SIMT occurring within the βm phase, resulting in softening. Conversely, as the Fe content increased, the strength of the alloy increased due to a reduction in α'' formation and the contributions of solid solution strengthening and grain strengthening. Moreover, unlike the other alloys, shear bands were observed only in the fracture of the Ti-5Mo-4Fe alloy, which was attributed to differences in texture and microstructure.

The parallelization of binarization using a GP-GPU

  • Han, Seong Hyeon;Yoo, Suk Won
    • International Journal of Advanced Culture Technology
    • /
    • v.4 no.4
    • /
    • pp.57-63
    • /
    • 2016
  • In this paper, we propose the optimized binarization in the GP-GPU. Because the binarinztion is esily paralledlized, we propose two ways of binary operations that utilize GP-GPU. The first method was to divide data load, subtraction and conversion, data store. The second method was processed collectibely. The second method was 2.52 times faster than the first method. After synthesizing the GP-GPU to the FPGA, the GP-GPU on the binarization were compared with the binarization on the ODROID XU. The binarization on the GP-GPU was 1.89 times faster than the binarization on the ODROID XU.

A Study on Architecture Improving Performance of openCV (openCV 의 성능 향상을 위한 아키텍처 연구)

  • Cho, Yeongpil;Heo, Ingoo;Kim, Yongjoo;Paek, Yunheung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.11a
    • /
    • pp.18-20
    • /
    • 2011
  • 최근 컴퓨터 비전의 활용 영역이 증가함에 따라 컴퓨터 비전의 대표적인 라이브러리인 openCV의 사용 또한 증가하는 추세이다. openCV 에는 컴퓨터 비전 알고리즘의 특성상 massive 한 연산을 수행해야 하는 부분이 상당수 존재한다. 본 논문은 이러한 연산량의 부담을 줄임으로써 openCV 의 성능 향상을 위한 아키텍처를 연구한다. openCV 의 massive 한 연산은 라이브러리 함수에 있는 내부 반복문에서 발생하기 때문에, 본 논문은 반복문의 특성을 분석하고 이를 가속할 수 있는 아키텍처가 무엇인지 연구한다. 결론적으로 반복문의 각 iteration 이 독립적일 경우에는 SIMD (Single Instruction Multiple Data)와 SIMT (Single Instruction Multiple Thread)이 적합하며 반복문의 각 iteration 이 의존적일 경우에는 MIMD (Multiple Instruction Multiple Data)를 바탕으로 하는 파이프라인 아키텍처가 적합하다.

Implementation of high performance parallel LU factorization program for multi-threads on GPGPUs (GPGPU의 멀티 쓰레드를 활용한 고성능 병렬 LU 분해 프로그램의 구현)

  • Shin, Bong-Hi;Kim, Young-Tae
    • Journal of Internet Computing and Services
    • /
    • v.12 no.3
    • /
    • pp.131-137
    • /
    • 2011
  • GPUs were originally designed for graphic processing, and GPGPUs are general-purpose GPUs for numerical computation with high performance and low electric power. In this paper, we implemented the parallel LU factorization program for GPGPUs. In CUDA, which is computational environment for Nvidia GPGPUs, domains are divided into blocks, and multi-threads compute each sub-blocks Simultaneously. In LU factorization program, computation order should be artificially decided due to the data dependence. To resolve the data dependancy, we suggested a parallel LU program for GPGPUs, and also explained parallel reduction algorithm for partial pivoting of LU factorization. We finally present performance analysis to show efficiency of the parallel LU factorization program based on multi-threads on GPGPUs.