• Title/Summary/Keyword: embedded GPU

Search Result 50, Processing Time 0.032 seconds

OpenGL ES Compiler Implementation for Embedded Graphic Processor (임베디드 그래픽 프로세서를 위한 OpenGL ES 컴파일러 개발)

  • Im, Soo-Jun;Song, Jun-Sup;Shin, Dong-Kun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.167-169
    • /
    • 2012
  • 오늘날 휴대용 기기에서의 그래픽 처리 요구사항이 증가함에 따라 저전력, 저비용 그래픽 프로세서의 필요성이 대두되고 있다. 이에 따라 크로노스 그룹은 휴대기기를 위한 그래픽 API 표준인 OpenGL ES 2.0을 발표하였다. 본 논문에서는 OpenGL ES 2.0을 상정하여 구성된 그래픽 프로세서를 위한 쉐이더 컴파일러를 개발하고 최적화하는 연구를 수행하였다. 개발된 컴파일러는 OpenGL ESSL로 작성된 쉐이더 프로그램을 정상적으로 컴파일하고 동작시켰으며 타겟 GPU에 적합한 최적화 기법을 적용하여 쉐이더 프로그램의 크기를 최대 10%가량 절감하고 성능을 10~15%가량 향상시켰다.

EVOLUTIONARY MODELS OF ROTATING DENSE STELLAR SYSTEMS WITH EMBEDDED BLACK HOLES

  • FIESTAS, JOSE A.
    • Publications of The Korean Astronomical Society
    • /
    • v.30 no.2
    • /
    • pp.345-347
    • /
    • 2015
  • We present evolutionary models of rotating self-gravitating systems (e.g. globular clusters, galaxy cores). These models are characterized by the presence of an initial axi-symmetry due to rotation. Central black hole seeds are included in our models, and black hole growth due to the consumption of stellar matter is simulated until the central potential dominates the kinematics of the core. Our goal is to study the long-term evolution (Gyr) of relaxed dense stellar systems which deviate from spherical symmetry, and their morphology and final kinematics. With this purpose in mind, we developed a 2D Fokker-Planck analytical code, and confirmed its results using detailed N-Body simulations, applying a high performance code developed for GPU machines. We conclude that the initial rotation significantly modifies the shape and lifetime of these systems, and cannot be neglected in the study of the evolution of globular clusters, and the galaxy itself. Our models give a constraint for the final intermediate black hole masses expected to be present in globular clusters.

Object Detection of Infrared Thermal Image Based on Single Shot Multibox Detector Model for Embedded System (임베디드 시스템용 Single Shot Multibox Detector Model 기반 적외선 열화상 영상의 객체검출)

  • NA, Woong Hwan;Kim, Eung Tae
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2019.06a
    • /
    • pp.9-12
    • /
    • 2019
  • 지난 수 년 동안 계속해서 일반 실상 카메라를 이용한 영상분석기술에 대한 연구가 활발히 진행되고 있다. 최근에는 딥러닝 기술을 적용한 지능형 영상분석기술로 발전해 왔으며 국방기지방호, CCTV, 사용자 얼굴인식, 머신비전, 자동차, 드론 산업이 활성화되면서 많은 시너지를 효과를 일으키고 있다. 그러나 어두운 밤과 안개, 날씨, 연기 등 다양한 여건에서 따라서 카메라의 영상분석 정확성 감소와 오류가 수반될 수 있으며 일반적으로 딥러닝 기술을 활용하기 위해서는 고사양의 GPU를 필요로 하기 때문에 다른 추가적인 시스템이 요구된다. 이에 본 연구에서는 열적외선 영상의 객체 검출에 적용하기 위해 SSD(Single Shot MultiBox Detector) 기반의 경량적인 MobilNet 네트워크로 재구성하여, 모바일 기기 등 낮은 사양의 낮은 임베디드 시스템에서도 활용 할 수 있는 방법을 제안한다. 모의 실험결과 제안된 방식의 모델은 적외선 열화상 카메라에서 객체검출과 학습시간이 줄어든 것을 확인 할 수 있었다.

  • PDF

Real-time face detection in embedded system (실시간 얼굴 검출을 위한 임베디드 시스템에서의 구현방법)

  • Yoo, Hye-Bin;Park, Sung-Hyun;Jeong, Hye-Won;Park, Myung-Suk;Kim, Sang-Hoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.1066-1069
    • /
    • 2020
  • 본 논문에서는 임베디드 GPU 보드를 탑재한 로봇에서의 검출 결과를 원격지에서 확인할 수 있는 방법에 대해 기술하였다. 딥러닝 모델의 연산량을 줄이는 방법 대신 Nvidia에서 제공하는 라이브러리를 이용하여 성능을 개선하였고, 로봇의 배터리 소모를 최소화하기 위해 실시간 영상 통신이 아닌 검출이 되었을 시에만 통신이 되게 하여 보다 긴 구동 시간을 얻도록 하였다.

Embedded artificial intelligence system development for action estimation on construction site (사용자 행동예측을 위한 임베디드 인공지능 엔진 및 시스템 기술 개발)

  • Song, Hyok;Choi, Inkyu;Ko, Minsoo;Yoo, Jisang
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2021.06a
    • /
    • pp.226-227
    • /
    • 2021
  • 딥러닝을 활용한 영상 분석 기술은 GPU 하드웨어의 발전으로 인하여 소프트웨어 기반 처리 기술이 급격히 발전하였고 기존 패턴 분석 기술 대비 높은 정확도를 보여주고 있다. PC나 특정 하드웨어에서 동작하는 소프트웨어 기반 영상분석기술은 적용분야의 한계가 발생하였다. 신경망 기술을 하드웨어로 구현한 NPU(Network processing unit)의 개발로 고가의 플랫폼이 아닌 임베디드 플랫폼에서의 딥러닝 구현이 가능해졌다. 반면에 하드웨어에서 활용 가능한 네트워크가 제한적임으로 인하여 구현 가능한 딥러닝 모델의 크기, 메모리 등의 한계가 있으며 시시각각 변하는 딥러닝 기술에 기반한 최신모델 또는 고성능 모델을 구동하기에는 한계가 발생하였다. 이를 해결하기 위하여 본 연구에서는 Distillation 기법을 적용한 임베디드 시스템을 개발하고 이에 기반한 딥러닝 모델의 구현 및 상황에 따른 가변적 딥러닝 모델의 적용이 가능한 시스템을 구현하였다.

  • PDF

Development of Street Crossing Assistive Embedded System for the Visually-Impaired Using Machine Learning Algorithm (머신러닝을 이용한 시각장애인 도로 횡단 보조 임베디드 시스템 개발)

  • Oh, SeonTaek;Jeong, Kidong;Kim, Homin;Kim, Young-Keun
    • Journal of the HCI Society of Korea
    • /
    • v.14 no.2
    • /
    • pp.41-47
    • /
    • 2019
  • In this study, a smart assistive device is designed to recognize pedestrian signal and to provide audio instructions for visually impaired people in crossing streets safely. Walking alone is one of the biggest challenges to the visually impaired and it deteriorates their life quality. The proposed device has a camera attached on a pair of glasses which can detect traffic lights, recognize pedestrian signals in real-time using a machine learning algorithm on GPU board and provide audio instructions to the user. For the portability, the dimension of the device is designed to be compact and light but with sufficient battery life. The embedded processor of device is wired to the small camera which is attached on a pair of glasses. Also, on inner part of the leg of the glasses, a bone-conduction speaker is installed which can give audio instructions without blocking external sounds for safety reason. The performance of the proposed device was validated with experiments and it showed 87.0% recall and 100% precision for detecting pedestrian green light, and 94.4% recall and 97.1% precision for detecting pedestrian red light.

Parallel Implementations of Digital Focus Indices Based on Minimax Search Using Multi-Core Processors

  • HyungTae, Kim;Duk-Yeon, Lee;Dongwoon, Choi;Jaehyeon, Kang;Dong-Wook, Lee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.2
    • /
    • pp.542-558
    • /
    • 2023
  • A digital focus index (DFI) is a value used to determine image focus in scientific apparatus and smart devices. Automatic focus (AF) is an iterative and time-consuming procedure; however, its processing time can be reduced using a general processing unit (GPU) and a multi-core processor (MCP). In this study, parallel architectures of a minimax search algorithm (MSA) are applied to two DFIs: range algorithm (RA) and image contrast (CT). The DFIs are based on a histogram; however, the parallel computation of the histogram is conventionally inefficient because of the bank conflict in shared memory. The parallel architectures of RA and CT are constructed using parallel reduction for MSA, which is performed through parallel relative rating of the image pixel pairs and halved the rating in every step. The array size is then decreased to one, and the minimax is determined at the final reduction. Kernels for the architectures are constructed using open source software to make it relatively platform independent. The kernels are tested in a hexa-core PC and an embedded device using Lenna images of various sizes based on the resolutions of industrial cameras. The performance of the kernels for the DFIs was investigated in terms of processing speed and computational acceleration; the maximum acceleration was 32.6× in the best case and the MCP exhibited a higher performance.

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

  • Hong, Jung-Hyun;Park, Joo-Yul;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2648-2668
    • /
    • 2016
  • Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an OpenCL framework. The LDPC code is one of the most popular and strongest error correcting codes for mobile communication systems. Each step of LDPC decoding has different parallelization characteristics. In the proposed LDPC decoder, steps suitable for task-level parallelization are executed on the multi-core central processing unit (CPU), and steps suitable for data-level parallelization are processed by the graphics processing unit (GPU). To improve the performance of OpenCL kernels for LDPC decoding operations, explicit thread scheduling, vectorization, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance and high power efficiency by using heterogeneous multi-core processors on a unified computing framework.

Real-time Ray-tracing Chip Architecture

  • Yoon, Hyung-Min;Lee, Byoung-Ok;Cheong, Cheol-Ho;Hur, Jin-Suk;Kim, Sang-Gon;Chung, Woo-Nam;Lee, Yong-Ho;Park, Woo-Chan
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.2
    • /
    • pp.65-70
    • /
    • 2015
  • In this paper, we describe the world's first real-time ray-tracing chip architecture. Ray-tracing technology generates high-quality 3D graphics images better than current rasterization technology by providing four essential light effects: shadow, reflection, refraction and transmission. The real-time ray-tracing chip named RayChip includes a real-time ray-tracing graphics processing unit and an accelerating tree-building unit. An ARM Ltd. central processing unit (CPU) and other peripherals are also included to support all processes of 3D graphics applications. Using the accelerating tree-building unit named RayTree to minimize the CPU load, the chip uses a low-end CPU and decreases both silicon area and power consumption. The evaluation results with RayChip show appropriate performance to support real-time ray tracing in high-definition (HD) resolution, while the rendered images are scaled to full HD resolution. The chip also integrates the Linux operating system and the familiar OpenGL for Embedded Systems application programming interface for easy application development.

An Implementation of a Convolutional Accelerator based on a GPGPU for a Deep Learning (Deep Learning을 위한 GPGPU 기반 Convolution 가속기 구현)

  • Jeon, Hee-Kyeong;Lee, Kwang-yeob;Kim, Chi-yong
    • Journal of IKEEE
    • /
    • v.20 no.3
    • /
    • pp.303-306
    • /
    • 2016
  • In this paper, we propose a method to accelerate convolutional neural network by utilizing a GPGPU. Convolutional neural network is a sort of the neural network learning features of images. Convolutional neural network is suitable for the image processing required to learn a lot of data such as images. The convolutional layer of the conventional CNN required a large number of multiplications and it is difficult to operate in the real-time on the embedded environment. In this paper, we reduce the number of multiplications through Winograd convolution operation and perform parallel processing of the convolution by utilizing SIMT-based GPGPU. The experiment was conducted using ModelSim and TestDrive, and the experimental results showed that the processing time was improved by about 17%, compared to the conventional convolution.