• Title/Summary/Keyword: 임베디드 GPU

Search Result 41, Processing Time 0.027 seconds

Acceleration of ECC Computation for Robust Massive Data Reception under GPU-based Embedded Systems (GPU 기반 임베디드 시스템에서 대용량 데이터의 안정적 수신을 위한 ECC 연산의 가속화)

  • Kwon, Jisu;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.956-962
    • /
    • 2020
  • Recently, as the size of data used in an embedded system increases, the need for an ECC decoding operation to robustly receive a massive data is emphasized. In this paper, we propose a method to accelerate the execution of computations that derive syndrome vectors when ECC decoding is performed using Hamming code in an embedded system with a built-in GPU. The proposed acceleration method uses the matrix-vector multiplication of the decoding operation using the CSR format, one of the data structures representing sparse matrix, and is performed in parallel in the CUDA kernel of the GPU. We evaluated the proposed method using a target embedded board with a GPU, and the result shows that the execution time is reduced when ECC decoding operation accelerated based on the GPU than used only CPU.

Performance Enhancement and Evaluation of a Deep Learning Framework on Embedded Systems using Unified Memory (통합메모리를 이용한 임베디드 환경에서의 딥러닝 프레임워크 성능 개선과 평가)

  • Lee, Minhak;Kang, Woochul
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.7
    • /
    • pp.417-423
    • /
    • 2017
  • Recently, many embedded devices that have the computing capability required for deep learning have become available; hence, many new applications using these devices are emerging. However, these embedded devices have an architecture different from that of PCs and high-performance servers. In this paper, we propose a method that improves the performance of deep-learning framework by considering the architecture of an embedded device that shares memory between the CPU and the GPU. The proposed method is implemented in Caffe, an open-source deep-learning framework, and is evaluated on an NVIDIA Jetson TK1 embedded device. In the experiment, we investigate the image recognition performance of several state-of-the-art deep-learning networks, including AlexNet, VGGNet, and GoogLeNet. Our results show that the proposed method can achieve significant performance gain. For instance, in AlexNet, we could reduce image recognition latency by about 33% and energy consumption by about 50%.

Performance Enhancement and Evaluation of AES Cryptography using OpenCL on Embedded GPGPU (OpenCL을 이용한 임베디드 GPGPU환경에서의 AES 암호화 성능 개선과 평가)

  • Lee, Minhak;Kang, Woochul
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.7
    • /
    • pp.303-309
    • /
    • 2016
  • Recently, an increasing number of embedded processors such as ARM Mali begin to support GPGPU programming frameworks, such as OpenCL. Thus, GPGPU technologies that have been used in PC and server environments are beginning to be applied to the embedded systems. However, many embedded systems have different architectural characteristics compare to traditional PCs and low-power consumption and real-time performance are also important performance metrics in these systems. In this paper, we implement a parallel AES cryptographic algorithm for a modern embedded GPU using OpenCL, a standard parallel computing framework, and compare performance against various baselines. Experimental results show that the parallel GPU AES implementation can reduce the response time by about 1/150 and the energy consumption by approximately 1/290 compare to OpenMP implementation when 1000KB input data is applied. Furthermore, an additional 100 % performance improvement of the parallel AES algorithm was achieved by exploiting the characteristics of embedded GPUs such as removing copying data between GPU and host memory. Our results also demonstrate that higher performance improvement can be achieved with larger size of input data.

Parallel Processing Method on CPU for Image Processing on Mobile Heterogeneous Computing System (모바일 이기종 컴퓨팅 시스템에서 영상처리 고속화를 위한 CPU측 병렬처리 방법)

  • Beak, Aram;Choi, Haechul
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2015.07a
    • /
    • pp.181-182
    • /
    • 2015
  • 모바일 기기의 보급률과 성능이 급속도로 성장하면서 모바일 기기에서의 비디오 소비 또한 크게 증가하였다. 하지만, 전력과 공간을 줄이기 위해 설계된 모바일 플랫폼은 데스크톱 플랫폼과 비교하여 성능의 한계가 존재한다. 따라서 대용량 비디오 처리를 위해 SIMD 아키텍쳐를 이용하는 임베디드 GPU를 활용하여 이와 같은 한계를 극복하기 위한 고속화 연구가 많이 진행되고 있다. 저장된 데이터를 활용하는 영상처리는 GPU 뿐만 아니라 CPU가 반드시 함께 이용되어야 하며, 모바일 환경에서의 이기종 컴퓨팅 시스템은 프로세서 사이의 낮은 전송속도와 이로 인한 대기시간, 모바일 운영체제가 지원하는 데이터 형태의 필수적인 사용 등의 구조적 단점이 존재한다. 본 논문에서는 임베디드 GPU를 활용한 영상처리 고속화를 위해 임베디드 CPU측에서 병렬처리를 이용하여 앞서 설명한 단점들을 극복하고 실험결과로 모바일 이기종 컴퓨팅 구조에서 임베디드 CPU 활용이 전체적인 연산 효율을 증가시키는 결과를 보였다.

  • PDF

Implementation of Integrated CPU-GPU for Efficient Uniform Memory Access Method and Verification System (CPU-GPU간 긴밀성을 위한 효율적인 공유메모리 접근 방법과 검증 시스템 구현)

  • Park, Hyun-moon;Kwon, Jinsan;Hwang, Tae-ho;Kim, Dong-Sun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.2
    • /
    • pp.57-65
    • /
    • 2016
  • In this paper, we propose a system for efficient use of shared memory between CPU and GPU. The system, called Fusion Architecture, assures consistency of the shared memory and minimizes cache misses that frequently occurs on Heterogeneous System Architecture or Unified Virtual Memory based systems. It also maximizes the performance for memory intensive jobs by efficient allocation of GPU cores. To test between architectures on various scenarios, we introduce the Fusion Architecture Analyzer, which compares OpenMP, OpenCL, CUDA, and the proposed architecture in terms of memory overhead and process time. As a result, Proposed fusion architectures show that the Fusion Architecture runs benchmarks 55% faster and reduces memory overheads by 220% in average.

Deep Learning-Based Real-Time Pedestrian Detection on Embedded GPUs (임베디드 GPU에서의 딥러닝 기반 실시간 보행자 탐지 기법)

  • Vien, An Gia;Lee, Chul
    • Journal of Broadcast Engineering
    • /
    • v.24 no.2
    • /
    • pp.357-360
    • /
    • 2019
  • We propose an efficient single convolutional neural network (CNN) for pedestrian detection on embedded GPUs. We first determine the optimal number of the convolutional layers and hyper-parameters for a lightweight CNN. Then, we employ a multi-scale approach to make the network robust to the sizes of the pedestrians in images. Experimental results demonstrate that the proposed algorithm is capable of real-time operation, while providing higher detection performance than conventional algorithms.

A Study on the Performance of Stereo Matching Algorithms in NVIDIA Jetson TX2 (NVIDIA Jetson TX2에서 스테레오 매칭 알고리즘들에 대한 성능에 관한 연구)

  • Lee, Gyu-Cheol;Yoo, Jisang
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2018.06a
    • /
    • pp.164-165
    • /
    • 2018
  • 2017년 3월에 NVIDIA에서 출시한 Jetson TX2는 GPU를 탑재한 고성능의 임베디드 보드이다. 이 제품은 GPU를 이용한 병렬 처리를 통해 임베디드 시스템 상에서 연산량이 많은 알고리즘을 동작시킬 수 있다. 스테레오 매칭 기법은 스테레오 카메라를 이용하여 깊이 정보를 획득할 수 있으며, 획득한 깊이 정보는 다양한 어플리케이션의 메타 데이터로써 활용될 수 있다. 하지만 알고리즘의 연산량이 매우 많아 GPU를 탑재한 데스크톱에서만 동작하는 것이 일반적이었다. 이에 본 논문은 임베디드 보드인 Jetson TX2에서 기존에 개발되었던 스테레오 매칭 알고리즘들을 동작시키고 성능 분석을 통해 실시간 동작 여부에 대한 연구를 진행하였다.

  • PDF

Embedded GPU based Fast Image Processing for Mobile Device (임베디드 GPU 기반 영상처리 고속화 방법)

  • Lee, Kang-Woon;Beak, A-Ram;Cho, Haechul
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2014.11a
    • /
    • pp.39-40
    • /
    • 2014
  • 카메라를 갖춘 모바일 기기가 보편화되면서 모바일 환경에서 영상처리를 이용한 다양한 응용이 확산되고 있다. 영상은 다른 정보에 비해 데이터의 양이 비교적 방대하기 때문에 모바일 환경에서 영상처리를 수행하기 위해서는 처리속도, 전력, 발열 등의 물리적 제약조건이 존재할 수 있다. 본 논문에서는 이러한 문제를 극복하기 위해 모바일 기기에서 코프로세서인 임베디드 GPU(Graphic Processing Unit)를 이용한 영상처리의 고속화 방법을 제시한다. 실험에서는 보편적으로 활용되는 영상처리 알고리즘에 대해 CPU(Central Processing Unit) 및 GPU 각각에서의 성능을 비교함으로써 고속화 방법의 우수성을 검증하고 특징을 분석하였다.

  • PDF

Efficient Implementation of Candidate Region Extractor for Pedestrian Detection System with Stereo Camera based on GP-GPU (스테레오 영상 보행자 인식 시스템의 후보 영역 검출을 위한 GP-GPU 기반의 효율적 구현)

  • Jeong, Geun-Yong;Jeong, Jun-Hee;Lee, Hee-Chul;Jeon, Gwang-Gil;Cho, Joong-Hwee
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.2
    • /
    • pp.121-128
    • /
    • 2013
  • There have been various research efforts for pedestrian recognition in embedded imaging systems. However, many suffer from their heavy computational complexities. SVM classification method has been widely used for pedestrian recognition. The reduction of candidate region is crucial for low-complexity scheme. In this paper, We propose a real time HOG based pedestrian detection system on GPU which images are captured by a pair of cameras. To speed up humans on road detection, the proposed method reduces a number of detection windows with disparity-search and near-search algorithm and uses the GPU and the NVIDIA CUDA framework. This method can be achieved speedups of 20% or more compared to the recent GPU implementations. The effectiveness of our algorithm is demonstrated in terms of the processing time and the detection performance.

Multiview Stereo Matching on Mobile Devices Using Parallel Processing on Embedded GPU (임베디드 GPU에서의 병렬처리를 이용한 모바일 기기에서의 다중뷰 스테레오 정합)

  • Jeon, Yun Bae;Park, In Kyu
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.1064-1071
    • /
    • 2019
  • Multiview stereo matching algorithm is used to reconstruct 3D shape from a set of 2D images. Conventional multiview stereo algorithms have been implemented on high-performance hardware due to the heavy complexity that contains a large number of calculations in each step. However, as the performance of mobile graphics processors has recently increased rapidly, complex computer vision algorithms can now be implemented on mobile devices like a smartphone and an embedded board. In this paper we parallelize an multiview stereo algorithm using OpenCL on mobile GPU and provide various optimization techniques on the embedded hardware with limited resource.