Search | Korea Science

PartitionTuner: An operator scheduler for deep-learning compilers supporting multiple heterogeneous processing units

Misun Yu;Yongin Kwon;Jemin Lee;Jeman Park;Junmo Park;Taeho Kim
- ETRI Journal
- /
- v.45 no.2
- /
- pp.318-328
- /
- 2023
Recently, embedded systems, such as mobile platforms, have multiple processing units that can operate in parallel, such as centralized processing units (CPUs) and neural processing units (NPUs). We can use deep-learning compilers to generate machine code optimized for these embedded systems from a deep neural network (DNN). However, the deep-learning compilers proposed so far generate codes that sequentially execute DNN operators on a single processing unit or parallel codes for graphic processing units (GPUs). In this study, we propose PartitionTuner, an operator scheduler for deep-learning compilers that supports multiple heterogeneous PUs including CPUs and NPUs. PartitionTuner can generate an operator-scheduling plan that uses all available PUs simultaneously to minimize overall DNN inference time. Operator scheduling is based on the analysis of DNN architecture and the performance profiles of individual and group operators measured on heterogeneous processing units. By the experiments for seven DNNs, PartitionTuner generates scheduling plans that perform 5.03% better than a static type-based operator-scheduling technique for SqueezeNet. In addition, PartitionTuner outperforms recent profiling-based operator-scheduling techniques for ResNet50, ResNet18, and SqueezeNet by 7.18%, 5.36%, and 2.73%, respectively.
https://doi.org/10.4218/etrij.2021-0446 인용 PDF

Correlation Propagation Neural Networks for processing On-line Interpolation of Multi-dimention Information (임의의 다차원 정보의 온라인 전송을 위한 상관기법전파신경망)

Kim, Jong-Man;Kim, Won-Sop
- Proceedings of the KIEE Conference
- /
- 2007.11c
- /
- pp.83-87
- /
- 2007
Correlation Propagation Neural Networks is proposed for On-line interpolation. The proposed neural network technique is the real time computation method through the inter-node diffusion. In the network, a node corresponds to a state in the quantized input space. Each node is composed of a processing unit and fixed weights from its neighbor nodes as well as its input terminal. Information propagates among neighbor nodes laterally and inter-node interpolation is achieved. Through several simulation experiments, real time reconstruction of the nonlinear image information is processed. 1-D CPNN hardware has been implemented with general purpose analog ICs to test the interpolation capability of the proposed neural networks. Experiments with static and dynamic signals have been done upon the CPNN hardware.
PDF

Development of Information Propagation Neural Networks processing On-line Interpolation (실시간 보간 가능을 갖는 정보전파신경망의 개발)

Kim, Jong-Man;Sin, Dong-Yong;Kim, Hyong-Suk;Kim, Sung-Joong
- Proceedings of the KIEE Conference
- /
- 1998.07b
- /
- pp.461-464
- /
- 1998
Lateral Information Propagation Neural Networks (LIPN) is proposed for on-line interpolation. The proposed neural network technique is the real time computation method through the inter-node diffusion. In the network, a node corresponds to a state in the quantized input space. Each node is composed of a processing unit and fixed weights from its neighbor nodes as well as its input terminal. Information propagates among neighbor nodes laterally and inter-node interpolation is achieved. Through several simulation experiments, real time reconstruction of the nonlinear image information is processed. 1-D LIPN hardware has been implemented with general purpose analog ICs to test the interpolation capability of the proposed neural networks. Experiments with static and dynamic signals have been done upon the LIPN hardware.
PDF

Implementation of Neural Networks using GPU (GPU를 이용한 신경망 구현)

Oh Kyoung-su;Jung Keechul
- The KIPS Transactions:PartB
- /
- v.11B no.6
- /
- pp.735-742
- /
- 2004
We present a new use of common graphics hardware to perform a faster artificial neural network. And we examine the use of GPU enhances the time performance of the image processing system using neural network, In the case of parallel computation of multiple input sets, the vector-matrix products become matrix-matrix multiplications. As a result, we can fully utilize the parallelism of GPU. Sigmoid operation and bias term addition are also implemented using pixel shader on GPU. Our preliminary result shows a performance enhancement of about thirty times faster using ATI RADEON 9800 XT board.
https://doi.org/10.3745/KIPSTB.2004.11B.6.735 인용 PDF KSCI

Analysis of Tensor Processing Unit and Simulation Using Python (텐서 처리부의 분석 및 파이썬을 이용한 모의실행)

Lee, Jongbok
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.19 no.3
- /
- pp.165-171
- /
- 2019
The study of the computer architecture has shown that major improvements in price-to-energy performance stems from domain-specific hardware development. This paper analyzes the tensor processing unit (TPU) ASIC which can accelerate the reasoning of the artificial neural network (NN). The core device of the TPU is a MAC matrix multiplier capable of high-speed operation and software-managed on-chip memory. The execution model of the TPU can meet the reaction time requirements of the artificial neural network better than the existing CPU and the GPU execution models, with the small area and the low power consumption even though it has many MAC and large memory. Utilizing the TPU for the tensor flow benchmark framework, it can achieve higher performance and better power efficiency than the CPU or CPU. In this paper, we analyze TPU, simulate the Python modeled OpenTPU, and synthesize the matrix multiplication unit, which is the key hardware.
https://doi.org/10.7236/JIIBC.2019.19.3.165 인용 PDF KSCI HTML

A MNN(Modular Neural Network) for Robot Endeffector Recognition (로봇 Endeffector 인식을 위한 모듈라 신경회로망)

김영부;박동선
- Proceedings of the IEEK Conference
- /
- 1999.06a
- /
- pp.496-499
- /
- 1999
This paper describes a medular neural network(MNN) for a vision system which tracks a given object using a sequence of images from a camera unit. The MNN is used to precisely recognize the given robot endeffector and to minize the processing time. Since the robot endeffector can be viewed in many different shapes in 3-D space, a MNN structure, which contains a set of feedforwared neural networks, co be more attractive in recognizing the given object. Each single neural network learns the endeffector with a cluster of training patterns. The training patterns for a neural network share the similar charateristics so that they can be easily trained. The trained MNN is less sensitive to noise and it shows the better performance in recognizing the endeffector. The recognition rate of MNN is enhanced by 14% over the single neural network. A vision system with the MNN can precisely recognize the endeffector and place it at the center of a display for a remote operator.
PDF

A Study on the Diphone Recognition of Korean Connected Words and Eojeol Reconstruction (한국어 연결단어의 이음소 인식과 어절 형성에 관한 연구)

;Jeong, Hong
- The Journal of the Acoustical Society of Korea
- /
- v.14 no.4
- /
- pp.46-63
- /
- 1995
This thesis described an unlimited vocabulary connected speech recognition system using Time Delay Neural Network(TDNN). The recognition unit is the diphone unit which includes the transition section of two phonemes, and the number of diphone unit is 329. The recognition processing of korean connected speech is composed by three part; the feature extraction section of the input speech signal, the diphone recognition processing and post-processing. In the feature extraction section, the extraction of diphone interval in input speech signal is carried and then the feature vectors of 16th filter-bank coefficients are calculated for each frame in the diphone interval. The diphone recognition processing is comprised by the three stage hierachical structure and is carried using 30 Time Delay Neural Networks. particularly, the structure of TDNN is changed so as to increase the recognition rate. The post-processing section, mis-recognized diphone strings are corrected using the probability of phoneme transition and the probability o phoneme confusion and then the eojeols (Korean word or phrase) are formed by combining the recognized diphones.
PDF

A Design of Reconfigurable Neural Network Processor (재구성 가능한 신경망 프로세서의 설계)

장영진;이현수
- Proceedings of the IEEK Conference
- /
- 1999.11a
- /
- pp.368-371
- /
- 1999
In this paper, we propose a neural network processor architecture with on-chip learning and with reconfigurability according to the data dependencies of the algorithm applied. For the neural network model applied, the proposed architecture can be configured into either SIMD or SRA(Systolic Ring Array) without my changing of on-chip configuration so as to obtain a high throughput. However, changing of system configuration can be controlled by user program. To process activation function, which needs amount of cycles to get its value, we design it by using PWL(Piece-Wise Linear) function approximation method. This unit has only single latency and the processing ability of non-linear function such as sigmoid gaussian function etc. And we verified the processing mechanism with EBP(Error Back-Propagation) model.
PDF

Fake News Detection Using Deep Learning

Lee, Dong-Ho;Kim, Yu-Ri;Kim, Hyeong-Jun;Park, Seung-Myun;Yang, Yu-Jun
- Journal of Information Processing Systems
- /
- v.15 no.5
- /
- pp.1119-1130
- /
- 2019
With the wide spread of Social Network Services (SNS), fake news-which is a way of disguising false information as legitimate media-has become a big social issue. This paper proposes a deep learning architecture for detecting fake news that is written in Korean. Previous works proposed appropriate fake news detection models for English, but Korean has two issues that cannot apply existing models: Korean can be expressed in shorter sentences than English even with the same meaning; therefore, it is difficult to operate a deep neural network because of the feature scarcity for deep learning. Difficulty in semantic analysis due to morpheme ambiguity. We worked to resolve these issues by implementing a system using various convolutional neural network-based deep learning architectures and "Fasttext" which is a word-embedding model learned by syllable unit. After training and testing its implementation, we could achieve meaningful accuracy for classification of the body and context discrepancies, but the accuracy was low for classification of the headline and body discrepancies.
https://doi.org/10.3745/JIPS.04.0142 인용 PDF KSCI

An Efficient FPGA Based TDC Accelerator for Deconvolutional Neural Networks (효율적인 DCNN 연산을 위한 FPGA 기반 TDC 가속기)

Jang, Hyerim;Moon, Byungin
- Proceedings of the Korea Information Processing Society Conference
- /
- 2021.05a
- /
- pp.457-458
- /
- 2021
딥러닝 알고리즘 중 DCNN(DeConvolutional Neural Network)은 이미지 업스케일링과 생성·복원 등 다양한 분야에서 뛰어난 성능을 보여주고 있다. DCNN은 많은 양의 데이터를 병렬로 처리할 수 있기 때문에 하드웨어로 설계하는 것이 유용하다. 최근 DCNN의 하드웨어 구조 연구에서는 overlapping sum 문제를 해결하기 위해 deconvolution 필터를 convolution 필터로 변환하는 TDC(Transforming the Deconvolutional layer into the Convolutional layer) 알고리즘이 제안되었다. 하지만 TDC를 CPU(Central Processing Unit)로 수행하기 때문에 연산의 최적화가 어려우며, 외부 메모리를 사용하기에 추가적인 전력이 소모된다. 이에 본 논문에서는 저전력으로 구동할 수 있는 FPGA 기반 TDC 하드웨어 구조를 제안한다. 제안하는 하드웨어 구조는 자원 사용량이 적어 저전력으로 구동 가능할 뿐만 아니라, 병렬 처리 구조로 설계되어 빠른 연산 처리 속도를 보인다.
https://doi.org/10.3745/PKIPS.y2021m05a.457 인용 PDF

Search Result 100, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)