• Title/Summary/Keyword: Multicore processors

Search Result 49, Processing Time 0.024 seconds

High Performance Message Scattering Algorithm in Multicore Processor (멀티코어 프로세서에서의 효율적인 메시지 스캐터링 지원 기법)

  • Park, Jongsu
    • Journal of Platform Technology
    • /
    • v.10 no.2
    • /
    • pp.3-9
    • /
    • 2022
  • In this paper, to maximize the performance of the scatter communication in multi-core and many-core processors, a technique that considers the communication situation of the processing node is applied to a multi-core processor composed of 32 processing nodes. Since the existing scatter algorithm cannot recognize the communication conditions of the processing nodes, communication is generally performed according to an initially set transmission order. In this case, scatter communication starts only after the communication currently being performed by all processing nodes inside the processor is finished. The scatter communication performance was improved by this technique, and it was confirmed that there was a performance improvement of up to 78.93% compared to the existing algorithm through BFM simulation.

Hardware-Software Cosynthesis of Multitask Multicore SoC with Real-Time Constraints (실시간 제약조건을 갖는 다중태스크 다중코어 SoC의 하드웨어-소프트웨어 통합합성)

  • Lee Choon-Seung;Ha Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.592-607
    • /
    • 2006
  • This paper proposes a technique to select processors and hardware IPs and to map the tasks into the selected processing elements, aming to achieve high performance with minimal system cost when multitask applications with real-time constraints are run on a multicore SoC. Such technique is called to 'Hardware-Software Cosynthesis Technique'. A cosynthesis technique was already presented in our early work [1] where we divide the complex cosynthesis problem into three subproblems and conquer each subproblem separately: selection of appropriate processing components, mapping and scheduling of function blocks to the selected processing component, and schedulability analysis. Despite good features, our previous technique has a serious limitation that a task monopolizes the entire system resource to get the minimum schedule length. But in general we may obtain higher performance in multitask multicore system if independent multiple tasks are running concurrently on different processor cores. In this paper, we present two mapping techniques, task mapping avoidance technique(TMA) and task mapping pinning technique(TMP), which are applicable for general cases with diverse operating policies in a multicore environment. We could obtain significant performance improvement for a multimedia real-time application, multi-channel Digital Video Recorder system and for randomly generated multitask graphs obtained from the related works.

Analysis on the Cooling Efficiency of High-Performance Multicore Processors according to Cooling Methods (기계식 쿨링 기법에 따른 고성능 멀티코어 프로세서의 냉각 효율성 분석)

  • Kang, Seung-Gu;Choi, Hong-Jun;Ahn, Jin-Woo;Park, Jae-Hyung;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.7
    • /
    • pp.1-11
    • /
    • 2011
  • Many researchers have studied on the methods to improve the processor performance. However, high integrated semiconductor technology for improving the processor performance causes many problems such as battery life, high power density, hotspot, etc. Especially, as hotspot has critical impact on the reliability of chip, thermal problems should be considered together with performance and power consumption when designing high-performance processors. To alleviate the thermal problems of processors, there have been various researches. In the past, mechanical cooling methods have been used to control the temperature of processors. However, up-to-date microprocessors causes severe thermal problems, resulting in increased cooling cost. Therefore, recent studies have focused on architecture-level thermal-aware design techniques than mechanical cooling methods. Even though architecture-level thermal-aware design techniques are efficient for reducing the temperature of processors, they cause performance degradation inevitably. Therefore, if the mechanical cooling methods can manage the thermal problems of processors efficiently, the performance can be improved by reducing the performance degradation due to architecture-level thermal-aware design techniques such as dynamic thermal management. In this paper, we analyze the cooling efficiency of high-performance multicore processors according to mechanical cooling methods. According to our experiments using air cooler and liquid cooler, the liquid cooler consumes more power than the air cooler whereas it reduces the temperature more efficiently. Especially, the cost for reducing $1^{\circ}C$ is varied by the environments. Therefore, if the mechanical cooling methods can be used appropriately, the temperature of high-performance processors can be managed more efficiently.

An Interference Matrix Based Approach to Bounding Worst-Case Inter-Thread Cache Interferences and WCET for Multi-Core Processors

  • Yan, Jun;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.131-140
    • /
    • 2011
  • Different cores typically share the last-level cache in a multi-core processor. Threads running on different cores may interfere with each other. Therefore, the multi-core worst-case execution time (WCET) analyzer must be able to safely and accurately estimate the worst-case inter-thread cache interference. This is not supported by current WCET analysis techniques that manly focus on single thread analysis. This paper presents a novel approach to analyze the worst-case cache interference and bounding the WCET for threads running on multi-core processors with shared L2 instruction caches. We propose to use an interference matrix to model inter-thread interference, on which basis we can calculate the worst-case inter-thread cache interference. Our experiments indicate that the proposed approach can give a worst-case bound less than 1%, as in benchmark fib-call, and an average 16.4% overestimate for threads running on a dual-core processor with shared-L2 cache. Our approach dramatically improves the accuracy of WCET overestimatation by on average 20.0% compared to work.

Comparing Energy Efficiency of MPI and MapReduce on ARM based Cluster (ARM 클러스터에서 에너지 효율 향상을 위한 MPI와 MapReduce 모델 비교)

  • Maqbool, Jahanzeb;Rizki, Permata Nur;Oh, Sangyoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.01a
    • /
    • pp.9-13
    • /
    • 2014
  • The performance of large scale software applications has been automatically increasing for last few decades under the influence of Moore's law - the number of transistors on a microprocessor roughly doubled every eighteen months. However, on-chip transistors limitations and heating issues led to the emergence of multicore processors. The energy efficient ARM based System-on-Chip (SoC) processors are being considered for future high performance computing systems. In this paper, we present a case study of two widely used parallel programming models i.e. MPI and MapReduce on distributed memory cluster of ARM SoC development boards. The case study application, Black-Scholes option pricing equation, was parallelized and evaluated in terms of power consumption and throughput. The results show that the Hadoop implementation has low instantaneous power consumption that of MPI, but MPI outperforms Hadoop implementation by a factor of 1.46 in terms of total power consumption to execution time ratio.

  • PDF

Application Characteristic-based Divided Scheduling for Multicore Systems

  • Park, Jung Kyu;Kim, Jaeho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.6
    • /
    • pp.9-16
    • /
    • 2017
  • In this paper, we proposed a novel user-level scheduling scheme that monitors applications characteristics on-line using PMU and allocates applications into cpu cores. We utilize PMU (Performance Monitoring Unit) to analyze which shared resource has the strongest relation with the influence. Using the proposed scheduling method, it is possible to reduce the contention of shared resources. The key idea of this scheme is separating high-influential applications into different processors. The evaluation results have shown that the proposed scheduling scheme can enhance the performance up to 12% for a 8 core system and up to 25% for a 28 core system, respectively.

A Study in the Effects of DRAM on The Microprocessor Performance (마이크로프로세서의 성능에 끼치는 DRAM의 영향에 관한 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.1
    • /
    • pp.219-224
    • /
    • 2017
  • Recently, the importance of DRAM is very significant not only in embedded systems and mobile devices but also in high-end modern microprocessors and multicore processors. To keep up with this, both industry and academia have actively studied various types of future DRAMs. Therefore, accurate DRAM model is requisite when evaluating the microprocessor performance. In this paper, a microprocessor trace-driven simulator which can couple with the cycle-accurate DRAM simulator has been developed. Using SPEC 2000 benchmarks as input, the effect of cycle-accurate DDR3 model on the microprocessor performance has been evaluated.

Computational Analytics of Client Awareness for Mobile Application Offloading with Cloud Migration

  • Nandhini, Uma;TamilSelvan, Latha
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.11
    • /
    • pp.3916-3936
    • /
    • 2014
  • Smartphone applications like games, image processing, e-commerce and social networking are gaining exponential growth, with the ubiquity of cellular services. This demands increased computational power and storage from mobile devices with a sufficiently high bandwidth for mobile internet service. But mobile nodes are highly constrained in the processing and storage, along with the battery power, which further restrains their dependability. Adopting the unlimited storage and computing power offered by cloud servers, it is possible to overcome and turn these issues into a favorable opportunity for the growth of mobile cloud computing. As the mobile internet data traffic is predicted to grow at the rate of around 65 percent yearly, even advanced services like 3G and 4G for mobile communication will fail to accommodate such exponential growth of data. On the other hand, developers extend popular applications with high end graphics leading to smart phones, manufactured with multicore processors and graphics processing units making them unaffordable. Therefore, to address the need of resource constrained mobile nodes and bandwidth constrained cellular networks, the computations can be migrated to resourceful servers connected to cloud. The server now acts as a bridge that should enable the participating mobile nodes to offload their computations through Wi-Fi directly to the virtualized server. Our proposed model enables an on-demand service offloading with a decision support system that identifies the capabilities of the client's hardware and software resources in judging the requirements for offloading. Further, the node's location, context and security capabilities are estimated to facilitate adaptive migration.

The Design and Simulation of Out-of-Order Execution Processor using Tomasulo Algorithm (토마술로 알고리즘을 이용하는 비순차실행 프로세서의 설계 및 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.4
    • /
    • pp.135-141
    • /
    • 2020
  • Today, CPUs in general-purpose computers such as servers, desktops and laptops, as well as home appliances and embedded systems, consist mostly of multicore processors. In order to improve performance, it is required to use an out-of-order execution processor by Tomasulo algorithm as each core processor. An out-of-order execution processor with Tomasulo algorithm can execute the available instructions in any order and perform speculation in order to reduce control dependencies. Therefore, the performance of an out-of-order execution processor can be significantly improved compared to an in-order execution processor. In this paper, an out-of-order execution processor using Tomasulo algorithm and ARM instruction set is designed using VHDL record data types and simulated by GHDL. As a result, it is possible to successfully perform operations on programs written in ARM instructions.