Search | Korea Science

PartitionTuner: An operator scheduler for deep-learning compilers supporting multiple heterogeneous processing units

Misun Yu;Yongin Kwon;Jemin Lee;Jeman Park;Junmo Park;Taeho Kim
- ETRI Journal
- /
- v.45 no.2
- /
- pp.318-328
- /
- 2023
Recently, embedded systems, such as mobile platforms, have multiple processing units that can operate in parallel, such as centralized processing units (CPUs) and neural processing units (NPUs). We can use deep-learning compilers to generate machine code optimized for these embedded systems from a deep neural network (DNN). However, the deep-learning compilers proposed so far generate codes that sequentially execute DNN operators on a single processing unit or parallel codes for graphic processing units (GPUs). In this study, we propose PartitionTuner, an operator scheduler for deep-learning compilers that supports multiple heterogeneous PUs including CPUs and NPUs. PartitionTuner can generate an operator-scheduling plan that uses all available PUs simultaneously to minimize overall DNN inference time. Operator scheduling is based on the analysis of DNN architecture and the performance profiles of individual and group operators measured on heterogeneous processing units. By the experiments for seven DNNs, PartitionTuner generates scheduling plans that perform 5.03% better than a static type-based operator-scheduling technique for SqueezeNet. In addition, PartitionTuner outperforms recent profiling-based operator-scheduling techniques for ResNet50, ResNet18, and SqueezeNet by 7.18%, 5.36%, and 2.73%, respectively.
https://doi.org/10.4218/etrij.2021-0446 인용 PDF

A Study on the Construction of Stable Clustering by Minimizing the Order Bias (순서 바이어스 최소화에 의한 안정적 클러스터링 구축에 관한 연구)

Lee, Gye-Seong
- The Transactions of the Korea Information Processing Society
- /
- v.6 no.6
- /
- pp.1571-1580
- /
- 1999
When a hierarchical structure is derived from data set for data mining and machine learning, using a conceptual clustering algorithm, one of the unsupervised learning paradigms, it is not unusual to have a different set of outcomes with respect to the order of processing data objects. To overcome this problem, the first classification process is proceeded to construct an initial partition. The partition is expected to imply the possible range in the number of final classes. We apply center sorting to the data objects in the classes of the partition for new data ordering and build a new partition using ITERATE clustering procedure. We developed an algorithm, REIT that leads to the final partition with stable and best partition score. A number of experiments were performed to show the minimization of order bias effects using the algorithm.
PDF

File Deduplication using Logical Partition of Storage System (저장 시스템의 논리 파티션을 이용한 파일 중복 제거)

Kong, Jin-San;Yoo, Chuck;Ko, Young-Woong
- IEMEK Journal of Embedded Systems and Applications
- /
- v.7 no.6
- /
- pp.345-351
- /
- 2012
In traditional target-based data deduplication system, all of the files should be chunked and compared for reducing duplicated data blocks. One of the critical problem of this system arises as the number of files are increasing. The system suffers from computational delay for calculating hash value and processing metadata for handling each file. To overcome this problem, in this paper, we propose a novel data deduplication system using logical partition of storage system. The system applies data deduplication scheme to each logical partition not each file. Experiment result shows that the proposed system is more efficient compared with traditional deduplication scheme where the logical partition is full of files by 50% in terms of deduplication capacity and processing time.
https://doi.org/10.14372/IEMEK.2012.7.6.345 인용 PDF KSCI

Performance Improvement of SAR Autofocus Based on Partition Processing (분할처리 기반 SAR 자동초점 기법의 성능 개선)

Shin, Hee-Sub;Ok, Jae-Woo;Kim, Jin-Woo;Lee, Jae-Min
- The Journal of Korean Institute of Electromagnetic Engineering and Science
- /
- v.28 no.7
- /
- pp.580-583
- /
- 2017
To compensate the degraded SAR image due to the residual errors and the spatial variant errors remaining after the motion compensation in the airborne SAR, we have introduced the autofocus method based on the partition processing. Thus, after we perform the spatial partition for the spotlight SAR data and the time partition for the stripmap SAR data, we reconstruct the subpatch images for the partitioned data. Then, we perform the local autofocus with the suitability analysis process for the phase errors estimated by the autofocus. Moreover, if the estimated phase errors are not properly compensated for the subpatch images, we perform the phase compensation method with the weight to the estimated phase error close to the degraded subpatch image to increase the SAR image quality.
https://doi.org/10.5515/KJKIEES.2017.28.7.580 인용 PDF KSCI

Mining Quantitative Association Rules using Commercial Data Mining Tools (상용 데이타 마이닝 도구를 사용한 정량적 연관규칙 마이닝)

Kang, Gong-Mi;Moon, Yang-Sae;Choi, Hun-Young;Kim, Jin-Ho
- Journal of KIISE:Databases
- /
- v.35 no.2
- /
- pp.97-111
- /
- 2008
Commercial data mining tools basically support binary attributes only in mining association rules, that is, they can mine binary association rules only. In general, however. transaction databases contain not only binary attributes but also quantitative attributes. Thus, in this paper we propose a systematic approach to mine quantitative association rules---association rules which contain quantitative attributes---using commercial mining tools. To achieve this goal, we first propose an overall working framework that mines quantitative association rules based on commercial mining tools. The proposed framework consists of two steps: 1) a pre-processing step which converts quantitative attributes into binary attributes and 2) a post-processing step which reconverts binary association rules into quantitative association rules. As the pre-processing step, we present the concept of domain partition, and based on the domain partition, we formally redefine the previous bipartition and multi-partition techniques, which are mean-based or median-based techniques for bipartition, and are equi-width or equi-depth techniques for multi-partition. These previous partition techniques, however, have the problem of not considering distribution characteristics of attribute values. To solve this problem, in this paper we propose an intuitive partition technique, named standard deviation minimization. In our standard deviation minimization, adjacent attributes are included in the same partition if the change of their standard deviations is small, but they are divided into different partitions if the change is large. We also propose the post-processing step that integrates binary association rules and reconverts them into the corresponding quantitative rules. Through extensive experiments, we argue that our framework works correctly, and we show that our standard deviation minimization is superior to other partition techniques. According to these results, we believe that our framework is practically applicable for naive users to mine quantitative association rules using commercial data mining tools.
PDF KSCI

Expanding Rule Using Recursive Partition Averaging (RPA 기법을 이용한 규칙의 확장)

Han Jin-Chul;Kim Sang-ki;Yoon Chung-Hwa
- Proceedings of the Korea Information Processing Society Conference
- /
- 2004.11a
- /
- pp.489-492
- /
- 2004
미지의 패턴을 분류하기 위해서 사용되는 메모리 기반 학습 기법은 만족할만한 분류 성능을 보여주고 있다. 하지만 메모리 기반 학습기법은 단순히 패턴과 메모리에 저장된 예제들 간의 거리를 기준으로 분류하므로, 패턴을 분류하는 처리과정을 설명할 수 없다는 문제점을 가지고 있다. 본 논문에서는 RPA(Recursive Partition Averaging) 기법을 이용하여 패턴을 분류하는 과정을 설명할 수 있는 규칙 추출 알고리즘과 또한 일반화 성능을 향상시키기 위하여 규칙의 조건을 확장하는 알고리즘을 제안한다.
PDF

Infrared Image Enhancement Using A Histogram Partition Stretching and Shrinking Method (히스토그램 분할 펼침과 축소 방법을 이용한 적외선 영상 개선)

Jung, Min Chul
- Journal of the Semiconductor & Display Technology
- /
- v.14 no.4
- /
- pp.50-55
- /
- 2015
This paper proposes a new histogram partition stretching and shrinking method for infrared image enhancement. The proposed method divides the histogram of an input image into three partitions according to its mean value and standard deviation. The method stretches both the dark partition and the bright partition of the histogram, while it shrinks the medium partition. As the result, both the dark part and the bright part of the image have more brightness levels. The proposed method is implemented using C language in an embedded Linux system for a high-speed real-time image processing. Experiments were conducted by using various infrared images. The results show that the proposed algorithm is successful for the infrared image enhancement.
PDF KSCI

An Improved Memory Based Reasoning using the Fixed Partition Averaging Algorithm (고정 분할 평균 알고리즘을 사용하는 향상된 메모리 기반 추론)

Jeong, Tae-Seon;Lee, Hyeong-Il;Yun, Chung-Hwa
- The Transactions of the Korea Information Processing Society
- /
- v.6 no.6
- /
- pp.1563-1570
- /
- 1999
In this paper, we proposed the FPA(Fixed Partition Averaging) algorithm in order to improve the storage requirement and classification time of Memory Based Reasoning method. The proposed method enables us to use the storage more efficiently by extracting representatives out of training patterns. After partitioning the pattern space into a fixed number of equally-sized hyperrectangles, it averages patterns in each hyperrectangle to extract a representative. Also we have used the mutual information between the features and classes as weights for features to improve the classification performance.
PDF

Design Technique and Application for Distributed Recovery Block Using the Partitioning Operating System Based on Multi-Core System (멀티코어 기반 파티셔닝 운영체제를 이용한 분산 복구 블록 설계 기법 및 응용)

Park, Hansol
- Journal of IKEEE
- /
- v.19 no.3
- /
- pp.357-365
- /
- 2015
Recently, embedded systems such as aircraft and automobilie, are developed as modular architecture instead of federated architecture because of SWaP(Size, Weight and Power) issues. In addition, partition operating system that support multiple logical node based on partition concept were recently appeared. Distributed recovery block is fault tolerance design scheme that applicable to mission critical real-time system to support real-time take over via real-time synchronization between participated nodes. Because of real-time synchronization, single-core based computer is not suitable for partition based distributed recovery block design scheme. Multi-core and AMP(Asymmetric Multi-Processing) based partition architecture is required to apply distributed recovery block design scheme. In this paper, we proposed design scheme of distributed recovery block on the multi-core based supervised-AMP architecture partition operating system. This paper implements flight control simulator for avionics to check feasibility of our design scheme.
https://doi.org/10.7471/ikeee.2015.19.3.357 인용 PDF KSCI

A Fast CU Size Decision Optimal Algorithm Based on Neighborhood Prediction for HEVC

Wang, Jianhua;Wang, Haozhan;Xu, Fujian;Liu, Jun;Cheng, Lianglun
- Journal of Information Processing Systems
- /
- v.16 no.4
- /
- pp.959-974
- /
- 2020
High efficiency video coding (HEVC) employs quadtree coding tree unit (CTU) structure to improve its coding efficiency, but at the same time, it also requires a very high computational complexity due to its exhaustive search processes for an optimal coding unit (CU) partition. With the aim of solving the problem, a fast CU size decision optimal algorithm based on neighborhood prediction is presented for HEVC in this paper. The contribution of this paper lies in the fact that we successfully use the partition information of neighborhood CUs in different depth to quickly determine the optimal partition mode for the current CU by neighborhood prediction technology, which can save much computational complexity for HEVC with negligible RD-rate (rate-distortion rate) performance loss. Specifically, in our scheme, we use the partition information of left, up, and left-up CUs to quickly predict the optimal partition mode for the current CU by neighborhood prediction technology, as a result, our proposed algorithm can effectively solve the problem above by reducing many unnecessary prediction and partition operations for HEVC. The simulation results show that our proposed fast CU size decision algorithm based on neighborhood prediction in this paper can reduce about 19.0% coding time, and only increase 0.102% BD-rate (Bjontegaard delta rate) compared with the standard reference software of HM16.1, thus improving the coding performance of HEVC.
https://doi.org/10.3745/JIPS.04.0186 인용 PDF KSCI

Search Result 183, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)