• Title/Summary/Keyword: multicore processor

Search Result 63, Processing Time 0.022 seconds

Efficient Message Scattering and Gathering Based on Processing Node Status (프로세서 노드 상황을 고려하는 효율적인 메시지 스캐터 및 개더 알고리즘)

  • Park, Jongsu
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.4
    • /
    • pp.637-640
    • /
    • 2022
  • To maximize performance in a high-performance multicore processor system. it is essential to enable effective data communication between processing cores. Data communication between processor nodes can be broadly classified into collective and point-to-point communications. Collective communication comprises scattering and gathering. This paper presents a efficient message scattering and gathering based on processing node status. In the proposed algorithms, the transmission order is changed according to the data size of the pre-existing communication, to reduce the waiting time required until the collective communications begin. From the simulation, the performances of the proposed message scattering and gathering algorithms were improved by approximately 71.41% and 69.84%.

The Design and Simulation of Out-of-Order Execution Processor using Tomasulo Algorithm (토마술로 알고리즘을 이용하는 비순차실행 프로세서의 설계 및 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.4
    • /
    • pp.135-141
    • /
    • 2020
  • Today, CPUs in general-purpose computers such as servers, desktops and laptops, as well as home appliances and embedded systems, consist mostly of multicore processors. In order to improve performance, it is required to use an out-of-order execution processor by Tomasulo algorithm as each core processor. An out-of-order execution processor with Tomasulo algorithm can execute the available instructions in any order and perform speculation in order to reduce control dependencies. Therefore, the performance of an out-of-order execution processor can be significantly improved compared to an in-order execution processor. In this paper, an out-of-order execution processor using Tomasulo algorithm and ARM instruction set is designed using VHDL record data types and simulated by GHDL. As a result, it is possible to successfully perform operations on programs written in ARM instructions.

Hardware-Software Cosynthesis of Multitask Multicore SoC with Real-Time Constraints (실시간 제약조건을 갖는 다중태스크 다중코어 SoC의 하드웨어-소프트웨어 통합합성)

  • Lee Choon-Seung;Ha Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.592-607
    • /
    • 2006
  • This paper proposes a technique to select processors and hardware IPs and to map the tasks into the selected processing elements, aming to achieve high performance with minimal system cost when multitask applications with real-time constraints are run on a multicore SoC. Such technique is called to 'Hardware-Software Cosynthesis Technique'. A cosynthesis technique was already presented in our early work [1] where we divide the complex cosynthesis problem into three subproblems and conquer each subproblem separately: selection of appropriate processing components, mapping and scheduling of function blocks to the selected processing component, and schedulability analysis. Despite good features, our previous technique has a serious limitation that a task monopolizes the entire system resource to get the minimum schedule length. But in general we may obtain higher performance in multitask multicore system if independent multiple tasks are running concurrently on different processor cores. In this paper, we present two mapping techniques, task mapping avoidance technique(TMA) and task mapping pinning technique(TMP), which are applicable for general cases with diverse operating policies in a multicore environment. We could obtain significant performance improvement for a multimedia real-time application, multi-channel Digital Video Recorder system and for randomly generated multitask graphs obtained from the related works.

Power-efficient Scheduling of Periodic Real-time Tasks on Lightly Loaded Multicore Processors (저부하 멀티코어 프로세서에서 주기적 실시간 작업들의 저전력 스케쥴링)

  • Lee, Wan-Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.8
    • /
    • pp.11-19
    • /
    • 2012
  • In this paper, we propose a power-efficient scheduling scheme for lightly loaded multicore processors which contain more processing cores than running tasks. The proposed scheme activates a portion of available cores and inactivates the other unused cores in order to save power consumption. The tasks are assigned to the activated cores based on a heuristic mechanism for fast task assignment. Each activated core executes its assigned tasks with the optimal clock frequency which minimizes the power consumption of the tasks while meeting their deadlines. Evaluation shows that the proposed scheme saves up to 78% power consumption of the previous method which activates as many processing cores as possible for the execution of the given tasks.

Probabilistic Power-saving Scheduling of a Real-time Parallel Task on Discrete DVFS-enabled Multi-core Processors (이산적 DVFS 멀티코어 프로세서 상에서 실시간 병렬 작업을 위한 확률적 저전력 스케쥴링)

  • Lee, Wan Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.2
    • /
    • pp.31-39
    • /
    • 2013
  • In this paper, we propose a power-efficient scheduling scheme that stochastically minimizes the power consumption of a real-time parallel task while meeting the deadline on multicore processors. The proposed scheme applies the parallel processing that executes a task on multiple cores concurrently, and activates a part of all available cores with unused cores powered off, in order to save power consumption. It is proved that the proposed scheme minimizes the mean power consumption of a real-time parallel task with probabilistic computation amount on DVFS-enabled multicore processors with a finite set of discrete clock frequencies. Evaluation shows that the proposed scheme saves up to 81% power consumption of the previous method.

An Efficient Cache Coherence Protocol for Multi-Core Processors with Ring Interconnects (링 연결구조 기반의 멀티코어 프로세서를 위한 캐시 일관성 유지 기법)

  • Park, Jin-Young;Choi, Lynn
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.8
    • /
    • pp.768-772
    • /
    • 2008
  • Today's microprocessor normally includes several processing cores to reduce the energy consumption without losing performance. In this paper, data transfer ordering mechanism can be efficiently used for cache coherence solution in unidirectional ring interconnect. RING-DATA ORDER combines the simplicity of GREEDY-ORDER and the performance of RING-ORDER. RING-DATA ORDER can be easily applicable to multicore processor with unidirectional ring interconnect.

Overhead Analysis of XtratuM for Space in SMP Envrionment (SMP 환경에서의 위성용 XtratuM 오버헤드 분석)

  • Kim, Sun-Wook;Yoo, Bum-Soo;Jeong, Jae-Yeop;Choi, Jong-Wook
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.15 no.4
    • /
    • pp.177-187
    • /
    • 2020
  • Virtualization with hypervisors is one of emerging topics in multicore processors for space. Hypervisors are software layers to make several independent virtualized environments on one processor. Since all hardware resources are virtualized and distributed only by hypervisors, overall performance of processors can be improved by fully utilizing the resources. However at the same time, there are overheads for virtualizing and distributing hardware resources. Satellites are one of hard real time systems, and performance degradation with overheads should be analyzed thoroughly. Previous research on the overheads focused on single core systems. Even the overheads were analyzed in multicore systems, SMP environment was not fully included. This paper builds SMP environment with XtratuM, one of hypervisors for space missions, and analyzes performance degradation with overheads. Two boards of GR712RC with 2 LEON3FT CPUs and GR740 with 4 LEON4 CPUs are used in experiments. On each board, SMP benchmark functions are executed on SMP environment with XtratuM and on that without XtratuM respectively. Results are analyzed to find timing characteristics including overheads. Finally, applicability of the XtratuM to flight software in SMP is also reviewed.

Parallelization of a Purely Functional Bisimulation Algorithm

  • Ahn, Ki Yung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.11-17
    • /
    • 2021
  • In this paper, we demonstrate a performance boost by parallelizing a purely functional bisimulation algorithm on a multicore processor machine. The key idea of this parallelization is exploiting the referential transparency of purely functional programs to minimize refactoring of the original implementation without any parallel constructs. Both original and parallel implementations are written in Haskell, a purely functional programming language. The change from the original program to the parallel program is minuscule, maintaining almost original structure of the program. Through benchmark, we show that the proposed parallelization doubles the performance of the bisimulation test compared to the original non-parallel implementation. We also shaw that similar performance boost is also possible for a memoized version of the bisimulation implementation.

Multicore Processor based Parallel SVM for Video Surveillance System (비디오 감시 시스템을 위한 멀티코어 프로세서 기반의 병렬 SVM)

  • Kim, Hee-Gon;Lee, Sung-Ju;Chung, Yong-Wha;Park, Dai-Hee;Lee, Han-Sung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.6
    • /
    • pp.161-169
    • /
    • 2011
  • Recent intelligent video surveillance system asks for development of more advanced technology for analysis and recognition of video data. Especially, machine learning algorithm such as Support Vector Machine (SVM) is used in order to recognize objects in video. Because SVM training demands massive amount of computation, parallel processing technique is necessary to reduce the execution time effectively. In this paper, we propose a parallel processing method of SVM training with a multi-core processor. The results of parallel SVM on a 4-core processor show that our proposed method can reduce the execution time of the sequential training by a factor of 2.5.

An Interference Matrix Based Approach to Bounding Worst-Case Inter-Thread Cache Interferences and WCET for Multi-Core Processors

  • Yan, Jun;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.131-140
    • /
    • 2011
  • Different cores typically share the last-level cache in a multi-core processor. Threads running on different cores may interfere with each other. Therefore, the multi-core worst-case execution time (WCET) analyzer must be able to safely and accurately estimate the worst-case inter-thread cache interference. This is not supported by current WCET analysis techniques that manly focus on single thread analysis. This paper presents a novel approach to analyze the worst-case cache interference and bounding the WCET for threads running on multi-core processors with shared L2 instruction caches. We propose to use an interference matrix to model inter-thread interference, on which basis we can calculate the worst-case inter-thread cache interference. Our experiments indicate that the proposed approach can give a worst-case bound less than 1%, as in benchmark fib-call, and an average 16.4% overestimate for threads running on a dual-core processor with shared-L2 cache. Our approach dramatically improves the accuracy of WCET overestimatation by on average 20.0% compared to work.