• Title/Summary/Keyword: Thread Pooling

Search Result 5, Processing Time 0.017 seconds

Efficient Thread Allocation Method of Convolutional Neural Network based on GPGPU (GPGPU 기반 Convolutional Neural Network의 효율적인 스레드 할당 기법)

  • Kim, Mincheol;Lee, Kwangyeob
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.10
    • /
    • pp.935-943
    • /
    • 2017
  • CNN (Convolution neural network), which is used for image classification and speech recognition among neural networks learning based on positive data, has been continuously developed to have a high performance structure to date. There are many difficulties to utilize in an embedded system with limited resources. Therefore, we use GPU (General-Purpose Computing on Graphics Processing Units), which is used for general-purpose operation of GPU to solve the problem because we use pre-learned weights but there are still limitations. Since CNN performs simple and iterative operations, the computation speed varies greatly depending on the thread allocation and utilization method in the Single Instruction Multiple Thread (SIMT) based GPGPU. To solve this problem, there is a thread that needs to be relaxed when performing Convolution and Pooling operations with threads. The remaining threads have increased the operation speed by using the method used in the following feature maps and kernel calculations.

Implementation of handwritten digit recognition CNN structure using GPGPU and Combined Layer (GPGPU와 Combined Layer를 이용한 필기체 숫자인식 CNN구조 구현)

  • Lee, Sangil;Nam, Kihun;Jung, Jun Mo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.3 no.4
    • /
    • pp.165-169
    • /
    • 2017
  • CNN(Convolutional Nerual Network) is one of the algorithms that show superior performance in image recognition and classification among machine learning algorithms. CNN is simple, but it has a large amount of computation and it takes a lot of time. Consequently, in this paper we performed an parallel processing unit for the convolution layer, pooling layer and the fully connected layer, which consumes a lot of handling time in the process of CNN, through the SIMT(Single Instruction Multiple Thread)'s structure of GPGPU(General-Purpose computing on Graphics Processing Units).And we also expect to improve performance by reducing the number of memory accesses and directly using the output of convolution layer not storing it in pooling layer. In this paper, we use MNIST dataset to verify this experiment and confirm that the proposed CNN structure is 12.38% better than existing structure.

Pedestrian Inference Convolution Neural Network Using GP-GPU (GP-GPU를 이용한 보행자 추론 CNN)

  • Jeong, Junmo
    • Journal of IKEEE
    • /
    • v.21 no.3
    • /
    • pp.244-247
    • /
    • 2017
  • In this paper, we implemented a convolution neural network using GP-GPU. After defining the structure, CNN performed inferencing using the GP-GPU with 256 threads, which was the previous study, using the weight obtained from the training. Training used Intel i7-4470 CPU and Matlab. Dataset used Daimler Pedestrian Dataset. The GP-GPU is controlled by the PC using PCIe and operates as an FPGA. We assigned a thread according to the depth and size of each layer. In the case of the pooling layer, we used over warpping pooling to perform additional operations on the horizontal and vertical regions. One inferencing takes about 12 ms.

Design and Implementation of File Cloud Server by Using JAVA SDK (Java SDK를 이용한 파일 클라우드 시스템의 설계 및 구현)

  • Lee, Samuel Sangkon
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.8 no.2
    • /
    • pp.86-100
    • /
    • 2015
  • Cloud computing is a computing term that evolved in the late 2000s, based on utility and consumption of computer resources. Google say that "Cloud computing involves deploying groups of remote servers and software networks that allow different kinds of data sources be uploaded for real time processing to generate computing results without the need to store processed data on the cloud. Cloud computing relies on sharing of resources to achieve coherence and economies of scale, similar to a utility (like the electricity grid) over a network. At the foundation of cloud computing is the broader concept of converged infrastructure and shared services. Cloud computing, or in simpler shorthand just "the cloud", also focuses on maximizing the effectiveness of the shared resources." The cloud service is a smart and/or intelligent service to save private files in any device, anytime, anywhere. Dropbox, OAuth, PAClous are required that the accumulated user's data are archives with cloud service. Currently we suggest an implementation technique to process many tasks to the cloud server with a thread pooling. Thread pooling is one of efficient implementating technique for client and service environment. In this paper, to present the implementation technique we suggest three diagrams in the consideration of software engineering.

Experimental Evaluation and Flexible Performance Improvement of IoT Middleware for Efficient Connectivity (사물간의 효율적인 연결을 위한 사물인터넷 미들웨어 실험 평가 및 성능 향상 방법)

  • Jeon, Soo Bin;Lee, Chung San;Han, Young Tak;Jung, In Bum
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.9
    • /
    • pp.385-396
    • /
    • 2017
  • Many IoT platforms have been proposed for various IoT devices, from low-end to high-end performance. We previously proposed a new IoT platform called MinT that supports the operation of the sensing devices and network communication. In the proposed platform, the things can flexibly connect to each other and efficiently share their information. Most IoT platforms, including the MinT, support thread pooling to quickly process requests. However, using a thread pool with a fixed thread count can cause network delay and inefficient energy consumption. In this paper, we propose an enhanced method to manage the thread pool efficiently by adjusting the number of threads every cycle to regulate the device's performance. In particular, we aim to improve the performance of the Interaction Thread Pool Group, which is responsible for analyzing, processing, and re-transmitting the received packets. The experiment shows that the improved method increases the average throughput by approximately 25% compared to the existing platforms. Finally, using the proposed method, the MinT can reduce the transmission delay and energy consumption of devices in the IoT environment.