Search | Korea Science

Tile Partitioning-based HEVC Parallel Decoding Optimization for Asymmetric Multicore Processor (비대칭 멀티코어 시스템 상의 HEVC 병렬 디코딩 최적화를 위한 타일 분할 기법)

Ryu, Yeongil;Roh, Hyun-Joon;Ryu, Eun-Seok
- Journal of KIISE
- /
- v.43 no.9
- /
- pp.1060-1065
- /
- 2016
Recently, there is an emerging need for parallel UHD video processing, and the usage of computing systems that have an asymmetric processor such as ARM big.LITTLE is actively increasing. Thus, a new parallel UHD video processing method that is optimized for the asymmetric multicore systems is needed. This paper proposes a novel HEVC tile partitioning method for parallel processing by analyzing the computational power of asymmetric multicores. The proposed method analyzes (1) the computing power of asymmetric multicores and (2) the regression model of computational complexity per video resolution. Finally, the model (3) determines the optimal HEVC tile resolution for each core and partitions/allocates the tiles to suitable cores. The proposed method minimizes the gap in the decoding time between the fastest CPU core and the slowest CPU core. Experimental results with the 4K UHD official test sequences show average 20% improvement in the decoding speedup on the ARM asymmetric multicore system.
https://doi.org/10.5626/JOK.2016.43.9.1060 인용 KSCI

Parallel Processing of k-Means Clustering Algorithm for Unsupervised Classification of Large Satellite Images: A Hybrid Method Using Multicores and a PC-Cluster (대용량 위성영상의 무감독 분류를 위한 k-Means Clustering 알고리즘의 병렬처리: 다중코어와 PC-Cluster를 이용한 Hybrid 방식)

Han, Soohee;Song, Jeong Heon
- Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
- /
- v.37 no.6
- /
- pp.445-452
- /
- 2019
In this study, parallel processing codes of k-means clustering algorithm were developed and implemented in a PC-cluster for unsupervised classification of large satellite images. We implemented intra-node code using multicores of CPU (Central Processing Unit) based on OpenMP (Open Multi-Processing), inter-nodes code using a PC-cluster based on message passing interface, and hybrid code using both. The PC-cluster consists of one master node and eight slave nodes, and each node is equipped with eight multicores. Two operating systems, Microsoft Windows and Canonical Ubuntu, were installed in the PC-cluster in turn and tested to compare parallel processing performance. Two multispectral satellite images were tested, which are a medium-capacity LANDSAT 8 OLI (Operational Land Imager) image and a high-capacity Sentinel 2A image. To evaluate the performance of parallel processing, speedup and efficiency were measured. Overall, the speedup was over N / 2 and the efficiency was over 0.5. From the comparison of the two operating systems, the Ubuntu system showed two to three times faster performance. To confirm that the results of the sequential and parallel processing coincide with the other, the center value of each band and the number of classified pixels were compared, and result images were examined by pixel to pixel comparison. It was found that care should be taken to avoid false sharing of OpenMP in intra-node implementation. To process large satellite images in a PC-cluster, code and hardware should be designed to reduce performance degradation caused by file I / O. Also, it was found that performance can differ depending on the operating system installed in a PC-cluster.
https://doi.org/10.7848/ksgpc.2019.37.6.445 인용 PDF KSCI

A PARALLEL IMPLEMENTATION OF A RELAXED HSS PRECONDITIONER FOR SADDLE POINT PROBLEMS FROM THE NAVIER-STOKES EQUATIONS

JANG, HO-JONG;YOUN, KIHANG
- Journal of the Korean Society for Industrial and Applied Mathematics
- /
- v.22 no.3
- /
- pp.155-162
- /
- 2018
We describe a parallel implementation of a relaxed Hermitian and skew-Hermitian splitting preconditioner for the numerical solution of saddle point problems arising from the steady incompressible Navier-Stokes equations. The equations are linearized by the Picard iteration and discretized with the finite element and finite difference schemes on two-dimensional and three-dimensional domains. We report strong scalability results for up to 32 cores.
https://doi.org/10.12941/jksiam.2018.22.155 인용 PDF KSCI

Static Timing Analysis of Shared Caches for Multicore Processors

Zhang, Wei;Yan, Jun
- Journal of Computing Science and Engineering
- /
- v.6 no.4
- /
- pp.267-278
- /
- 2012
The state-of-the-art techniques in multicore timing analysis are limited to analyze multicores with shared instruction caches only. This paper proposes a uniform framework to analyze the worst-case performance for both shared instruction caches and data caches in a multicore platform. Our approach is based on a new concept called address flow graph, which can be used to model both instruction and data accesses for timing analysis. Our experiments, as a proof-of-concept study, indicate that the proposed approach can accurately compute the worst-case performance for real-time threads running on a dual-core processor with a shared L2 cache (either to store instructions or data).
https://doi.org/10.5626/JCSE.2012.6.4.267 인용 PDF KSCI KPUBS

Virtual Machine Scheduling for Multicores Considering Effects of Shared On-chip Last Level Cache Interference (공유 말단 캐시에서의 간섭의 영향을 고려한 멀티코어 프로세서를 위한 가상 머신 스케줄링)

Kim, Shin-gyu;Choi, Chanho;Eom, Hyeonsang;Yeom, Heon Y.
- Proceedings of the Korea Information Processing Society Conference
- /
- 2012.04a
- /
- pp.134-136
- /
- 2012
클라우드 컴퓨팅 서비스 시장이 성장하면서, 서비스 제공자들은 전력 사용량 감소와 서비스 수준을 보장하는 등의 여러 가지 문제와 맞딱드리게 되었다. 이런 문제에 대한 원인 중 하나는 자원 효율성을 높이기 위해 도입한 가상머신 기반의 서버 통합 정책이다. 현재의 가상머신 기술들은 아직까지 완벽한 격리수준을 제공하지 못하기 때문에, 같은 노드에 배치된 가상머신들은 자원을 공유하면서 서로 간에 간섭을 일으키게 된다. 본 연구에서는 가상머신끼리 공유하는 자원 중 프로세서의 말단 캐시(Last-level Cache, LLC)에서의 간섭을 최대한 줄여서 성능을 극대화하기 위한 방법을 제안한다.
https://doi.org/10.3745/PKIPS.y2012m04a.134 인용 PDF

An Efficient Load Balancing Technique in a Multicore Mobile System (멀티코어 모바일 시스템에서 효과적인 부하 균등화 기법)

Cho, Jungseok;Cho, Doosan
- KIPS Transactions on Computer and Communication Systems
- /
- v.4 no.5
- /
- pp.153-160
- /
- 2015
The effectiveness of multicores depends on how well a scheduler can assign tasks onto the cores efficiently. In a heterogeneous multicore platform, the execution time of an application depends on which core it executes on. That is to say, the effectiveness of task assignment is one of the important components for a multicore systems' performance. This work proposes a load scheduling technique that analyzes execution time of each task by profiling. The profiling result provides a basic information to predict which task-to-core mapping is likely to provide the best performance. By using such information, the proposed technique is about 26% performance gain.
https://doi.org/10.3745/KTCCS.2015.4.5.153 인용 PDF KSCI

Performance Evaluation and Optimization of Journaling File Systems with Multicores and High-Performance Flash SSDs (멀티코어 및 고성능 플래시 SSD 환경에서 저널링 파일 시스템의 성능 평가 및 최적화)

Han, Hyuck
- The Journal of the Korea Contents Association
- /
- v.18 no.4
- /
- pp.178-185
- /
- 2018
Recently, demands for computer systems with multicore CPUs and high-performance flash-based storage devices (i.e., flash SSD) have rapidly grown in cloud computing, surer-computing, and enterprise storage/database systems. Journaling file systems running on high-performance systems do not exploit the full I/O bandwidth of high-performance SSDs. In this article, we evaluate and analyze the performance of the Linux EXT4 file system with high-performance SSDs and multicore CPUs. The system used in this study has 72 cores and Intel NVMe SSD, and the flash SSD has performance up to 2800/1900 MB/s for sequential read/write operations. Our experimental results show that checkpointing in the EXT4 file system is a major overhead. Furthermore, we optimize the checkpointing procedure and our optimized EXT4 file system shows up to 92% better performance than the original EXT4 file system.
https://doi.org/10.5392/JKCA.2018.18.04.178 인용 PDF KSCI

Improving Multi-DNN Computational Performance of Embedded Multicore Processors through a Global Queue (글로벌 큐를 통한 임베디드 멀티코어 프로세서의 멀티 DNN 연산 성능 향상)

Cho, Ho-jin;Kim, Myung-sun
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.24 no.6
- /
- pp.714-721
- /
- 2020
DNN is expanding its use in embedded systems such as robots and autonomous vehicles. For high recognition accuracy, computational complexity is greatly increased, and multiple DNNs are running aperiodically. Therefore, the ability processing multiple DNNs in embedded environments is a crucial issue. Accordingly, multicore based platforms are being released. However, most DNN models are operated in a batch process, and when multiple DNNs are operated in multicore together, the execution time deviation between each DNN may be large and the end-to-end execution time of the whole DNNs could be long depending on how they are allocated to the cores. In this paper, we solve these problems by providing a framework that decompose each DNN into individual layers and then distribute to multicores through a global queue. As a result of the experiment, the total DNN execution time was reduced by 31%, and when operating multiple identical DNNs, the deviation in execution time was reduced by up to 95.1%.
https://doi.org/10.6109/jkiice.2020.24.6.714 인용 PDF KSCI

Search Result 8, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)