• Title/Summary/Keyword: Multicore processor

Search Result 63, Processing Time 0.026 seconds

Low-power Filter Cache Design Technique for Multicore Processors (멀티 코어 프로세서를 위한 저전력 필터 캐쉬 설계 기법)

  • Park, Young-Jin;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.12
    • /
    • pp.9-16
    • /
    • 2009
  • Energy consumption as well as performance should be considered when designing up-to-date multicore processors. In this paper, we propose new design technique to reduce the energy consumption in the instruction cache for multicore processors by using modified filter cache. The filter cache has been recognized as one of the most energy-efficient design techniques for singlecore processors. The energy consumed in the instruction cache accounts for a significant portion of total processor energy consumption. Therefore, energy-aware instruction cache design techniques are essential to reduce the energy consumption in a multicore processor. The proposed technique reduces the energy consumption in the instruction cache for multicore processors by reducing the number of accesses to the level-1 instruction cache. We evaluate the proposed design using a simulation infrastructure based on SimpleScalar and CACTI. Simulation results show that the proposed architecture reduces the energy consumption in the instruction cache for multicore processors by up to 3.4% compared to the conventional filter cache architecture. Moreover, the proposed architecture shows better performance over the conventional filter cache architecture.

Thermal Pattern Comparison between 2D Multicore Processors and 3D Multicore Processors (2차원 구조와 3차원 구조에 따른 멀티코어 프로세서의 온도 분석)

  • Choi, Hong-Jun;Ahn, Jin-Woo;Jang, Hyung-Beom;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.9
    • /
    • pp.1-10
    • /
    • 2011
  • Unfortunately, in current microprocessors, increasing the frequency causes increased power consumption and reduced reliability whereas it improves the performance. To overcome the power and thermal problems in the singlecore processors, multicore processors has been widely used. For 2D multicore processors, interconnection is regarded as one of the major constraints in performance and power efficiency. To reduce the performance degradation and the power consumption in 2D multicore processors, 3D integrated design technique has been studied by many researchers. Compared to 2D multicore processors, 3D multicore processors get the benefits of performance improvement and reduced power consumption by reducing the wire length significantly. However, 3D multicore processors have serious thermal problems due to high power density, resulting in reliability degradation. Detailed thermal analysis for multicore processors can be useful in designing thermal-aware processors. In this paper, we analyze the impact of workload distribution, distance to the heat sink, and number of stacked dies on the processor temperature. We also analyze the effects of the temperature on overall system performance. Especially, this paper presents the guideline for thermal-aware multicore processor design by analyzing the thermal problems in 2D multicore processors and 3D multicore processors.

A Performance Study of Embedded Multicore Processor Architectures (임베디드 멀티코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.1
    • /
    • pp.163-169
    • /
    • 2013
  • Recently, the importance of embedded system is growing rapidly. In-order to satisfy the real-time constraints of the system, high performance embedded processor is required. Therefore, as in general purpose computer systems, embedded processor should be designed as multicore architecture as well. Using MiBench benchmarks as input, the trace-driven simulation has been performed and analyzed for the 2-core to 16-core embedded processor architectures with different types of cores from simple RISC to in-order and out-of-order superscalar processors, extensively. As a result, the achievable performance is as high as 23 times over the single core embedded RISC processor.

Performance Analysis of Multicore Processor Architectures Based On Cache Size Effects (캐쉬 용량 효과에 대한 멀티코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.6
    • /
    • pp.175-180
    • /
    • 2012
  • In order to overcome the complexity and performance limit problems of superscalar processors, the multicore architecture has been prevalent recently. The configuration and the size of instruction and data caches greatly gives effect on the performance of multicore processors. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the 2-core to 16-core architectures with different sizes of caches extensively. As a result, the 2-way set associative instruction and data cache with the size of 64KB brought the best cost-effective performance.

A Study on Statistical Simulation of Multicore Processor Architectures (멀티코어 프로세서의 통계적 모의실험에 관한 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.259-265
    • /
    • 2014
  • When the trace-driven simulation is used for the performance analysis of widely used multicore processors in the initial design stage, much time and disk space is necessary. In this paper, statistical simulations are performed for a high performance multicore processor with various hardware configurations. For the experiment, SPEC2000 benchmarks programs are used for profiling and synthesizing new instruction traces. As a result, the performance obtained by our statistical simulation is comparable to that of the trace-driven simulation with the benefit of tremendous reduction in the simulation time.

Multicore Real-Time Scheduling to Reduce Inter-Thread Cache Interferences

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.1
    • /
    • pp.67-80
    • /
    • 2013
  • The worst-case execution time (WCET) of each real-time task in multicore processors with shared caches can be significantly affected by inter-thread cache interferences. The worst-case inter-thread cache interferences are dependent on how tasks are scheduled to run on different cores. Therefore, there is a circular dependence between real-time task scheduling, the worst-case inter-thread cache interferences, and WCET in multicore processors, which is not the case for single-core processors. To address this challenging problem, we present an offline real-time scheduling approach for multicore processors by considering the worst-case inter-thread interferences on shared L2 caches. Our scheduling approach uses a greedy heuristic to generate safe schedules while minimizing the worst-case inter-thread shared L2 cache interferences and WCET. The experimental results demonstrate that the proposed approach can reduce the utilization of the resulting schedule by about 12% on average compared to the cyclic multicore scheduling approaches in our theoretical model. Our evaluation indicates that the enhanced scheduling approach is more likely to generate feasible and safe schedules with stricter timing constraints in multicore real-time systems.

Implementation of LTE Transport Channel on Multicore DSP Software Defined Radio Platform (멀티코어 DSP 기반 소프트웨어 정의 라디오 플랫폼을 활용한 LTE 전송 채널의 구현)

  • Lee, Jin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.4
    • /
    • pp.508-514
    • /
    • 2020
  • To implement the continuously evolving mobile communication standards such as Long Term Evolution (LTE) and 5G, the Software Defined Radio (SDR) concept provides great flexibility and efficiency. For many years, a high-end Digital Signal Processor (DSP) System on Chip (SoC) has been developed to support multicore and various hardware coprocessors. This paper introduces the implementation of the SDR platform hardware using TI's TCI663x chip. Using the platform, LTE transport channel is implemented by interworking multicore DSP with Bit rate Coprocessor (BCP) and Turbo Decoder Coprocessor (TCP) and the performance is evaluated according to various implementation options. In order to evaluate the performance of the implemented LTE transport channel, LTE base station system was constructed by combining FPGA main board for physical channels, SDR platform board, and RF & Antenna board.

Multicore Flow Processor with Wire-Speed Flow Admission Control

  • Doo, Kyeong-Hwan;Yoon, Bin-Yeong;Lee, Bhum-Cheol;Lee, Soon-Seok;Han, Man Soo;Kim, Whan-Woo
    • ETRI Journal
    • /
    • v.34 no.6
    • /
    • pp.827-837
    • /
    • 2012
  • We propose a flow admission control (FAC) for setting up a wire-speed connection for new flows based on their negotiated bandwidth. It also terminates a flow that does not have a packet transmitted within a certain period determined by the users. The FAC can be used to provide a reliable transmission of user datagram and transmission control protocol applications. If the period of flows can be set to a short time period, we can monitor active flows that carry a packet over networks during the flow period. Such powerful flow management can also be applied to security systems to detect a denial-of-service attack. We implement a network processor called a flow management network processor (FMNP), which is the second generation of the device that supports FAC. It has forty reduced instruction set computer core processors optimized for packet processing. It is fabricated in 65-nm CMOS technology and has a 40-Gbps process performance. We prove that a flow router equipped with an FMNP is better than legacy systems in terms of throughput and packet loss.

Exploiting Static Non-Uniform Cache Architectures for Hard Real-Time Computing

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.9 no.4
    • /
    • pp.177-189
    • /
    • 2015
  • High-performance processors using Non-Uniform Cache Architecture (NUCA) are increasingly used to deal with the growing wire delays in multicore/manycore processors. Due to the convergence of high-performance computing with embedded computing, NUCA caches are expected to benefit high-end embedded systems as well. However, for real-time systems that use multicore processors with NUCA caches, it is crucial to bound worst-case execution time (WCET) accurately and safely. In this paper, we developed a WCET analysis approach by considering the effect of static NUCA caches on WCET. We compared the WCET in real-time applications with different topologies of static NUCA caches. Our experimental results demonstrated that the static NUCA cache could improve the worst-case performance of realtime applications using multicore processor compared to the cache with uniform access time.

Static Timing Analysis of Shared Caches for Multicore Processors

  • Zhang, Wei;Yan, Jun
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.4
    • /
    • pp.267-278
    • /
    • 2012
  • The state-of-the-art techniques in multicore timing analysis are limited to analyze multicores with shared instruction caches only. This paper proposes a uniform framework to analyze the worst-case performance for both shared instruction caches and data caches in a multicore platform. Our approach is based on a new concept called address flow graph, which can be used to model both instruction and data accesses for timing analysis. Our experiments, as a proof-of-concept study, indicate that the proposed approach can accurately compute the worst-case performance for real-time threads running on a dual-core processor with a shared L2 cache (either to store instructions or data).