• Title/Summary/Keyword: GPU model

Search Result 164, Processing Time 0.034 seconds

Artificial Intelligence for the Fourth Industrial Revolution

  • Jeong, Young-Sik;Park, Jong Hyuk
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1301-1306
    • /
    • 2018
  • Artificial intelligence is one of the key technologies of the Fourth Industrial Revolution. This paper introduces the diverse kinds of approaches to subjects that tackle diverse kinds of research fields such as model-based MS approach, deep neural network model, image edge detection approach, cross-layer optimization model, LSSVM approach, screen design approach, CPU-GPU hybrid approach and so on. The research on Superintelligence and superconnection for IoT and big data is also described such as 'superintelligence-based systems and infrastructures', 'superconnection-based IoT and big data systems', 'analysis of IoT-based data and big data', 'infrastructure design for IoT and big data', 'artificial intelligence applications', and 'superconnection-based IoT devices'.

BCDR algorithm for network estimation based on pseudo-likelihood with parallelization using GPU (유사가능도 기반의 네트워크 추정 모형에 대한 GPU 병렬화 BCDR 알고리즘)

  • Kim, Byungsoo;Yu, Donghyeon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.381-394
    • /
    • 2016
  • Graphical model represents conditional dependencies between variables as a graph with nodes and edges. It is widely used in various fields including physics, economics, and biology to describe complex association. Conditional dependencies can be estimated from a inverse covariance matrix, where zero off-diagonal elements denote conditional independence of corresponding variables. This paper proposes a efficient BCDR (block coordinate descent with random permutation) algorithm using graphics processing units and random permutation for the CONCORD (convex correlation selection method) based on the BCD (block coordinate descent) algorithm, which estimates a inverse covariance matrix based on pseudo-likelihood. We conduct numerical studies for two network structures to demonstrate the efficiency of the proposed algorithm for the CONCORD in terms of computation times.

SimTBS: Simulator For GPGPU Thread Block Scheduling (SimTBS: GPGPU 스레드블록 스케줄링 시뮬레이터)

  • Cho, Kyung-Woon;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.4
    • /
    • pp.87-92
    • /
    • 2020
  • Although GPGPU (General-Purpose GPU) can maximize performance by parallelizing a task with tens of thousands of threads, those threads are internally grouped into a thread block, which is a base unit for processing and resource allocation. A thread block scheduler is a specialized hardware gadget whose role is to allocate thread blocks to GPGPU processing hardware in a round-robin manner. However, round-robin is a sequential allocation policy and is not optimized for GPGPU resource utilization. In this paper, we propose a thread block scheduler model which can analyze and quantify performances for various thread block scheduling policies. Experiment results from the implemented simulator of our model show that the legacy hardware thread block scheduling does not behave well when workload becomes heavy.

A study on the standardization strategy for building of learning data set for machine learning applications (기계학습 활용을 위한 학습 데이터세트 구축 표준화 방안에 관한 연구)

  • Choi, JungYul
    • Journal of Digital Convergence
    • /
    • v.16 no.10
    • /
    • pp.205-212
    • /
    • 2018
  • With the development of high performance CPU / GPU, artificial intelligence algorithms such as deep neural networks, and a large amount of data, machine learning has been extended to various applications. In particular, a large amount of data collected from the Internet of Things, social network services, web pages, and public data is accelerating the use of machine learning. Learning data sets for machine learning exist in various formats according to application fields and data types, and thus it is difficult to effectively process data and apply them to machine learning. Therefore, this paper studied a method for building a learning data set for machine learning in accordance with standardized procedures. This paper first analyzes the requirement of learning data set according to problem types and data types. Based on the analysis, this paper presents the reference model to build learning data set for machine learning applications. This paper presents the target standardization organization and a standard development strategy for building learning data set.

Realistic and Fast Depth-of-Field Rendering in Direct Volume Rendering (직접 볼륨 렌더링에서 사실적인 고속 피사계 심도 렌더링)

  • Kang, Jiseon;Lee, Jeongjin;Shin, Yeong-Gil;Kim, Bohyoung
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.15 no.5
    • /
    • pp.75-83
    • /
    • 2019
  • Direct volume rendering is a widely used method for visualizing three-dimensional volume data such as medical images. This paper proposes a method for applying depth-of-field effects to volume ray-casting to enable more realistic depth-of-filed rendering in direct volume rendering. The proposed method exploits a camera model based on the human perceptual model and can obtain realistic images with a limited number of rays using jittered lens sampling. It also enables interactive exploration of volume data by on-the-fly calculating depth-of-field in the GPU pipeline without preprocessing. In the experiment with various data including medical images, we demonstrated that depth-of-field images with better depth perception were generated 2.6 to 4 times faster than the conventional method.

Design and Utilization of Connected Data Architecture-based AI Service of Mass Distributed Abyss Storage (대용량 분산 Abyss 스토리지의 CDA (Connected Data Architecture) 기반 AI 서비스의 설계 및 활용)

  • Cha, ByungRae;Park, Sun;Seo, JaeHyun;Kim, JongWon;Shin, Byeong-Chun
    • Smart Media Journal
    • /
    • v.10 no.1
    • /
    • pp.99-107
    • /
    • 2021
  • In addition to the 4th Industrial Revolution and Industry 4.0, the recent megatrends in the ICT field are Big-data, IoT, Cloud Computing, and Artificial Intelligence. Therefore, rapid digital transformation according to the convergence of various industrial areas and ICT fields is an ongoing trend that is due to the development of technology of AI services suitable for the era of the 4th industrial revolution and the development of subdivided technologies such as (Business Intelligence), IA (Intelligent Analytics, BI + AI), AIoT (Artificial Intelligence of Things), AIOPS (Artificial Intelligence for IT Operations), and RPA 2.0 (Robotic Process Automation + AI). This study aims to integrate and advance various machine learning services of infrastructure-side GPU, CDA (Connected Data Architecture) framework, and AI based on mass distributed Abyss storage in accordance with these technical situations. Also, we want to utilize AI business revenue model in various industries.

Implementation of FPGA-based Accelerator for GRU Inference with Structured Compression (구조적 압축을 통한 FPGA 기반 GRU 추론 가속기 설계)

  • Chae, Byeong-Cheol
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.6
    • /
    • pp.850-858
    • /
    • 2022
  • To deploy Gate Recurrent Units (GRU) on resource-constrained embedded devices, this paper presents a reconfigurable FPGA-based GRU accelerator that enables structured compression. Firstly, a dense GRU model is significantly reduced in size by hybrid quantization and structured top-k pruning. Secondly, the energy consumption on external memory access is greatly reduced by the proposed reuse computing pattern. Finally, the accelerator can handle a structured sparse model that benefits from the algorithm-hardware co-design workflows. Moreover, inference tasks can be flexibly performed using all functional dimensions, sequence length, and number of layers. Implemented on the Intel DE1-SoC FPGA, the proposed accelerator achieves 45.01 GOPs in a structured sparse GRU network without batching. Compared to the implementation of CPU and GPU, low-cost FPGA accelerator achieves 57 and 30x improvements in latency, 300 and 23.44x improvements in energy efficiency, respectively. Thus, the proposed accelerator is utilized as an early study of real-time embedded applications, demonstrating the potential for further development in the future.

3D Tile Application Method for Improvement of Performance of V-world 3D Map Service (브이월드 3D 지도 서비스 성능 향상을 위한 3D 타일 적용 방안 연구)

  • Kim, Tae Hoon;Jang, Han Sol;Yoo, Sung Hwan;Go, Jun Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.1
    • /
    • pp.55-61
    • /
    • 2017
  • The V-world, korean type spatial information open platform, provides various services to easily utilize 2D, 3D map and administrative information of the country. Among them, V-world 3D map service, modeled in individual building unit, require requests for each building model file and the draw calls for drawing models on the screen by the request. This causes a large number of model requests and draw calls to occur that increase the latency occurring during the transmission and conversion process between the central processing unit(CPU) and the graphic processing unit(GPU), which lead to the performance degradation of the 3D map service. In this paper, we propose a performance improvement plan to reduce the performance degradation of 3D map service caused by multiple model requests and draw calls. Therefore, we tried to reduce the number of requests and draw calls for the model file by applying a 3D tile model that combined multiple building models to single tile. In addition, we applied the quadtree algorithm to reduce the time required to load the model file by shortening the retrieval time of the model. This is expected to contribute to improving the performance of 3D map service of V-world.

Improving the Rendering Speed of 3D Model Animation on Smart Phones

  • Ng, Cong Jie;Hwang, Gi-Hyun;Kang, Dae-Ki
    • Journal of information and communication convergence engineering
    • /
    • v.9 no.3
    • /
    • pp.266-270
    • /
    • 2011
  • The advancement of technology enables smart phones or handheld devices to render complex 3D graphics. However, the processing power and memory of smart phones remain very limited to render high polygon and details 3D models especially on games which requires animation, physic engine, or augmented reality. In this paper, several techniques will be introduced to speed up the computation and reducing the number of vertices of the 3D meshes without losing much detail.

Object Detection of Infrared Thermal Image Based on Single Shot Multibox Detector Model for Embedded System (임베디드 시스템용 Single Shot Multibox Detector Model 기반 적외선 열화상 영상의 객체검출)

  • NA, Woong Hwan;Kim, Eung Tae
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2019.06a
    • /
    • pp.9-12
    • /
    • 2019
  • 지난 수 년 동안 계속해서 일반 실상 카메라를 이용한 영상분석기술에 대한 연구가 활발히 진행되고 있다. 최근에는 딥러닝 기술을 적용한 지능형 영상분석기술로 발전해 왔으며 국방기지방호, CCTV, 사용자 얼굴인식, 머신비전, 자동차, 드론 산업이 활성화되면서 많은 시너지를 효과를 일으키고 있다. 그러나 어두운 밤과 안개, 날씨, 연기 등 다양한 여건에서 따라서 카메라의 영상분석 정확성 감소와 오류가 수반될 수 있으며 일반적으로 딥러닝 기술을 활용하기 위해서는 고사양의 GPU를 필요로 하기 때문에 다른 추가적인 시스템이 요구된다. 이에 본 연구에서는 열적외선 영상의 객체 검출에 적용하기 위해 SSD(Single Shot MultiBox Detector) 기반의 경량적인 MobilNet 네트워크로 재구성하여, 모바일 기기 등 낮은 사양의 낮은 임베디드 시스템에서도 활용 할 수 있는 방법을 제안한다. 모의 실험결과 제안된 방식의 모델은 적외선 열화상 카메라에서 객체검출과 학습시간이 줄어든 것을 확인 할 수 있었다.

  • PDF