• Title/Summary/Keyword: Multi-core scheduling

Search Result 42, Processing Time 0.026 seconds

A Parallelization Technique with Integrated Multi-Threading for Video Decoding on Multi-core Systems

  • Hong, Jung-Hyun;Kim, Won-Jin;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.10
    • /
    • pp.2479-2496
    • /
    • 2013
  • Increasing demand for Full High-Definition (FHD) video and Ultra High-Definition (UHD) video services has led to active research on high speed video processing. Widespread deployment of multi-core systems has accelerated studies on high resolution video processing based on parallelization of multimedia software. Even if parallelization of a specific decoding step may improve decoding performance partially, such partial parallelization may not result in sufficient performance improvement. Particularly, entropy decoding has often been considered separately from other decoding steps since the entropy decoding step could not be parallelized easily. In this paper, we propose a parallelization technique called Integrated Multi-Threaded Parallelization (IMTP) which takes parallelization of the entropy decoding step, with other decoding steps, into consideration in an integrated fashion. We used the Simultaneous Multi-Threading (SMT) technique with appropriate thread scheduling techniques to achieve the best performance for the entire decoding step. The speedup of the proposed IMTP method is up to 3.35 times faster with respect to the entire decoding time over a conventional decoding technique for H.264/AVC videos.

A Test Wrapper Design to Reduce Test Time for Multi-Core SoC (멀티코어 SoC의 테스트 시간 감축을 위한 테스트 Wrapper 설계)

  • Kang, Woo-Jin;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39B no.1
    • /
    • pp.1-7
    • /
    • 2014
  • This paper proposes an efficient test wrapper design that reduces overall test time in multi-core SoC. After initial local wrapper solution sets for all the cores are determined using well-known Combine algorithm, proposed algorithm selects a dominant core which consumes the longest test time in multi-core SoC. Then, the wrapper characteristics in the number of TAM wires and the test time for other cores are adjusted based on test time of the dominant core. For some specific cores, the number of TAM wires can be reduced by increasing its test time for design space exploration purposes. These modified wrapper characteristics are added to the previous wrapper solution set. By expanding previous local wrapper solution set to global wrapper solution set, overall test time for Multi-core SoC can be reduced by an efficient test scheduler. Effectiveness of the proposed wrapper is verified on ITC'02 benchmark circuits using $B^*$-tree based test scheduler. Our experimental results show that the test time is reduced by an average of 4.7% when compared to that of employing previous wrappers.

Peak Power Control for Improvement of Stability in Multi-core System (멀티코어 시스템의 안정성 향상을 위한 피크파워 제어 알고리즘)

  • Park, Sung-Hwan;Kim, Jae-Hwan;Ahn, Byung-Gyu;Jung, Il-Jong;Lee, Seok-Hee;Chong, Jong-Wha
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.747-748
    • /
    • 2008
  • In this paper, we propose a new algorithm for task scheduling consisting of subtask partitioning and subtask priority scheduling steps in order to keep the peak power under the system specification. The subtask partitioning stepis performed to minimize the idle operation time for processors by dividing a task into multiple subtasks using the least square method developed with power consumption pattern of tasks. In the subtask priority scheduling step, a priority is assigned to a subtask based on the power requirement and the power variation of subtask so that the peak power violation can be minimized and the task can be completed within the execution time deadline.

  • PDF

Job-aware Network Scheduling for Hadoop Cluster

  • Liu, Wen;Wang, Zhigang;Shen, Yanming
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.1
    • /
    • pp.237-252
    • /
    • 2017
  • In recent years, data centers have become the core infrastructure to deal with big data processing. For these big data applications, network transmission has become one of the most important factors affecting the performance. In order to improve network utilization and reduce job completion time, in this paper, by real-time monitoring from the application layer, we propose job-aware priority scheduling. Our approach takes the correlations of flows in the same job into account, and flows in the same job are assigned the same priority. Therefore, we expect that flows in the same job finish their transmissions at about the same time, avoiding lagging flows. To achieve load balancing, two approaches (Flow-based and Spray) using ECMP (Equal-Cost multi-path routing) are presented. We implemented our scheme using NS-2 simulator. In our evaluations, we emulate real network environment by setting background traffic, scheduling delay and link failures. The experimental results show that our approach can enhance the Hadoop job execution efficiency of the shuffle stage, significantly reduce the network transmission time of the highest priority job.

Multiple Signature Comparison of LogTM-SE for Fast Conflict Detection (다중 시그니처 비교를 통한 트랜잭셔널 메모리의 충돌해소 정책의 성능향상)

  • Kim, Deok-Ho;Oh, Doo-Hwan;Ro, Won-W.
    • The KIPS Transactions:PartA
    • /
    • v.18A no.1
    • /
    • pp.19-24
    • /
    • 2011
  • As era of multi-core processors has arrived, transactional memory has been considered as an effective method to achieve easy and fast multi-threaded programming. Various hardware transactional memory systems such as UTM, VTM, FastTM, LogTM, and LogTM-SE, have been introduced in order to implement high-performance multi-core processors. Especially, LogTM-SE has provided study performance with an efficient memory management policy and a practical thread scheduling method through conflict detection based on signatures. However, increasing number of cores on a processor imposes the hardware complexity for signature processing. This causes overall performance degradation due to the heavy workload on signature comparison. In this paper, we propose a new architecture of multiple signature comparison to improve conflict detection of signature based transactional memory systems.

A Study on Multi-agent based Task Assignment Systems for Virtual Enterprise (가상기업을 위한 멀티에이전트 기반 태스크할당시스템에 관한 연구)

  • 허준규;최경현;이석희
    • Transactions of the Korean Society of Machine Tool Engineers
    • /
    • v.12 no.3
    • /
    • pp.31-37
    • /
    • 2003
  • With the paradigm shifting from the principal of manufacturing efficiency to business globalism and rapid adaptation to its environments, more and more enterprises are being virtually organized as manufacturing network of different units in web. The formation of these enterprise called as Virtual Enterprise(VE) is becoming a growing trend as enterprises concentrating on core competence and economic benefit. 13us paper proposes multi-agent based task assignment system for VE, which attempts to address the selection of individually managed partners and the task assignment to them A case example is presented to illustrate how the proposed system can assign the task to partners.

A Beacon Scheduling for Mesh Topology in Wireless Sensor Networks (무선 센서 네트워크에서 메쉬 토폴로지를 위한 비컨 스케줄링)

  • Kim, Min-Jeong;Shim, Jun-Ho
    • The Journal of Society for e-Business Studies
    • /
    • v.15 no.4
    • /
    • pp.49-58
    • /
    • 2010
  • The wireless sensor network technology becomes one of core technologies to make it possible to implement various e-business applications. Energy efficiency is an important issue in wireless sensor networks. IEEE 802.15.4, a representative international standard for wireless sensor networks, provides the beacon enabled mode for energy-efficient communication. However, the beacons may conflict each other when the network is of multi-hop topology such as mesh or cluster-tree topology with beacon-enabled mode. The beacon conflict causes the failure of synchronization between sensor nodes, and affects other nodes in the network in that unsynchronized nodes cannot participate in communication. In this paper, we suggest an energy-efficient beacon scheduling for the wireless sensor networks. Nodes can save their energy duringperiod and prevent beacon conflict using beacon scheduling. We implement the scheduling using QualNet, and evaluate the performance under mesh topology networks. It turns out that the proposed scheduling may improve the energy efficiency in the networks.

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

  • Hong, Jung-Hyun;Park, Joo-Yul;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2648-2668
    • /
    • 2016
  • Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an OpenCL framework. The LDPC code is one of the most popular and strongest error correcting codes for mobile communication systems. Each step of LDPC decoding has different parallelization characteristics. In the proposed LDPC decoder, steps suitable for task-level parallelization are executed on the multi-core central processing unit (CPU), and steps suitable for data-level parallelization are processed by the graphics processing unit (GPU). To improve the performance of OpenCL kernels for LDPC decoding operations, explicit thread scheduling, vectorization, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance and high power efficiency by using heterogeneous multi-core processors on a unified computing framework.

Integrated Parallelization of Video Decoding on Multi-core Systems (멀티코어 시스템에서의 통합된 비디오 디코딩 병렬화)

  • Hong, Jung-Hyun;Kim, Won-Jin;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.7
    • /
    • pp.39-49
    • /
    • 2012
  • Demand for high resolution video services leads to active studies on high speed video processing. Especially, widespread deployment of multi-core systems accelerates researches on high resolution video processing based on parallelization of multimedia software. Previously proposed parallelization approach could improve the decoding performance. However, some parallelization methods did not consider the entropy decoding and others considered only a partial decoding parallelization. Therefore, we consider parallel entropy decoding integrated with other parallel video decoding process on a multi-core system. We propose a novel parallel decoding method called Integrated Parallelization. We propose a method on how to optimize the parallelization of video decoding when we have a multi-core system with many cores. We parallelized the KTA 2.7 decoder with the proposed technique on an Intel i7 Quad-Core platform with Intel Hyper-Threading technology and multi-threads scheduling. We achieved up to 70% performance improvement using IP method.

A Study on Machine Learning Compiler and Modulo Scheduler (머신러닝 컴파일러와 모듈로 스케쥴러에 관한 연구)

  • Doosan Cho
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.1
    • /
    • pp.87-95
    • /
    • 2024
  • This study is on modulo scheduling algorithms for multicore processor in machine learning applications. Machine learning algorithms are designed to perform a large amount of operations such as vectors and matrices in order to quickly process large amounts of data stream. To support such large amounts of computations, processor architectures to support applications such as artificial intelligence, neural networks, and machine learning are designed in the form of parallel processing such as multicore. To effectively utilize these multi-core hardware resources, various compiler techniques are being used and studied. In this study, among these compiler techniques, we analyzed the modular scheduler, which is especially important in one core's computation pipeline. This paper looked at and compared the iterative modular scheduler and the swing modular scheduler, which are the most widely used and studied. As a result, both schedulers provided similar performance results, and when measuring register pressure as an indicator, it was confirmed that the swing modulo scheduler provided slightly better performance. In this study, a technique that divides recurrence edge is proposed to improve the minimum initiation interval of the modulo schedulers.