• Title/Summary/Keyword: Caches

Search Result 135, Processing Time 0.04 seconds

Static Timing Analysis of Shared Caches for Multicore Processors

  • Zhang, Wei;Yan, Jun
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.4
    • /
    • pp.267-278
    • /
    • 2012
  • The state-of-the-art techniques in multicore timing analysis are limited to analyze multicores with shared instruction caches only. This paper proposes a uniform framework to analyze the worst-case performance for both shared instruction caches and data caches in a multicore platform. Our approach is based on a new concept called address flow graph, which can be used to model both instruction and data accesses for timing analysis. Our experiments, as a proof-of-concept study, indicate that the proposed approach can accurately compute the worst-case performance for real-time threads running on a dual-core processor with a shared L2 cache (either to store instructions or data).

Bounding Worst-Case Data Cache Performance by Using Stack Distance

  • Liu, Yu;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.4
    • /
    • pp.195-215
    • /
    • 2009
  • Worst-case execution time (WCET) analysis is critical for hard real-time systems to ensure that different tasks can meet their respective deadlines. While significant progress has been made for WCET analysis of instruction caches, the data cache timing analysis, especially for set-associative data caches, is rather limited. This paper proposes an approach to safely and tightly bounding data cache performance by computing the worst-case stack distance of data cache accesses. Our approach can not only be applied to direct-mapped caches, but also be used for set-associative or even fully-associative caches without increasing the complexity of analysis. Moreover, the proposed approach can statically categorize worst-case data cache misses into cold, conflict, and capacity misses, which can provide useful insights for designers to enhance the worst-case data cache performance. Our evaluation shows that the proposed data cache timing analysis technique can safely and accurately estimate the worst-case data cache performance, and the overestimation as compared to the observed worst-case data cache misses is within 1% on average.

Comparing Separate and Statically-Partitioned Caches for Time-Predictable Multicore Processors

  • Wu, Lan;Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.8 no.1
    • /
    • pp.25-33
    • /
    • 2014
  • In this paper, we quantitatively compare two different time-predictable multicore cache architectures, separate and statically-partitioned caches, through extensive simulation. Current research trends primarily focus on partitioned-cache architectures in order to achieve time predictability for hard real-time multicore based systems, and our experiments reveal that separate caches actually lead to much better performance and energy efficiency when compared to statically-partitioned caches, and both of them are adequate for timing analysis for real-time multicore applications.

Exploiting Static Non-Uniform Cache Architectures for Hard Real-Time Computing

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.9 no.4
    • /
    • pp.177-189
    • /
    • 2015
  • High-performance processors using Non-Uniform Cache Architecture (NUCA) are increasingly used to deal with the growing wire delays in multicore/manycore processors. Due to the convergence of high-performance computing with embedded computing, NUCA caches are expected to benefit high-end embedded systems as well. However, for real-time systems that use multicore processors with NUCA caches, it is crucial to bound worst-case execution time (WCET) accurately and safely. In this paper, we developed a WCET analysis approach by considering the effect of static NUCA caches on WCET. We compared the WCET in real-time applications with different topologies of static NUCA caches. Our experimental results demonstrated that the static NUCA cache could improve the worst-case performance of realtime applications using multicore processor compared to the cache with uniform access time.

Bounding Worst-Case Performance for Multi-Core Processors with Shared L2 Instruction Caches

  • Yan, Jun;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.1
    • /
    • pp.1-18
    • /
    • 2011
  • As the first step toward real-time multi-core computing, this paper presents a novel approach to bounding the worst-case performance for threads running on multi-core processors with shared L2 instruction caches. The idea of our approach is to compute the worst-case instruction access interferences between different threads based on the program control flow information of each thread, which can be statically analyzed. Our experiments indicate that the proposed approach can reasonably estimate the worst-case shared L2 instruction cache misses by considering the inter-thread instruction conflicts. Also, the worst-case execution time (WCET) of applications running on multi-core processors estimated by our approach is much better than the estimation by simply assuming all L2 instruction accesses are misses.

Power Performance of Instruction Pre-Fetch Unit (명령어 선 인출기의 전력 성능)

  • 송영규;오형철
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.365-368
    • /
    • 1999
  • In this paper, we investigate the effect of adopting branch-penalty compensation schemes on the power performance of TLBs(Translation Look-aside Buffers) and instruction caches. We found that the double-buffer branch-penalty compensation scheme can reduce the power consumption of the TLBs and the instruction caches considered by up to 14-21.3%. The power consumption is estimated through simulation at the architectural level, using the Kamble/Ghose method

  • PDF

Reducing Power Consumption of Data Caches for Embedded Processors (임베디드 프로세서를 위한 선인출 데이터캐시의 저전력화 방안)

  • Moon, Hyun-Ju;Jee, Sung-Hyun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.44 no.1
    • /
    • pp.1-9
    • /
    • 2007
  • Since data caches used in modern embedded processors consume significant fraction of total processor power up to 40%, embedded processors need power-efficient high performance data caches. This paper proposes a prefetching data cache structure which pursuing low power consumption. We added tag history table on existing data cache structure which includes hardware unit for data prefetching so that reduce the number of parallel lookup on tag memory. This strategic cache structure remarkably reduces power consumption for parallel tag lookup. Experimental results show that the proposed cache architecture induce low power consumption while maintain the same cache performance.

Dead Block-Aware Adaptive Write Scheme for MLC STT-MRAM Caches

  • Hong, Seokin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.1-9
    • /
    • 2020
  • In this paper, we propose an efficient adaptive write scheme that improves the performance of write operation in MLC STT-MRAM caches. The key idea of the proposed scheme is to perform the write operation fast if the target MLC STT-MRAM cells contain a dead block. Even if the fast write operation on the MLC STT-MRAM evicts a cache block from the MLC STT-MRAM cells, its performance impact is low if the evicted block is a dead block which is not used in the future. Through experimental evaluation with a memory simulator, we show that the proposed adaptive write scheme improves the performance of the MLC STT-MRAM caches by 17% on average.

An Efficient Replication Scheme in Unstructured Peer-to-Peer Networks (비구조적인 피어-투-피어 네트워크상에서 효율적인 복제기법)

  • Choi Wu-Rak;Han Sae-Young;Park Sung-Yong
    • The KIPS Transactions:PartA
    • /
    • v.13A no.1 s.98
    • /
    • pp.1-10
    • /
    • 2006
  • For efficient searching in unstructured peer-to-peer systems, random walk was proposed and several replication methods have been studied to compensate for the random walk's low query success rate. This paper proposes an efficient replication scheme that improves the accuracy and speed of queries and reduces the cost by minimizing the number of replicas and by utilizing caches. In this scheme, hub nodes store only content's caches, and one of their neighbors stores the replica. By determining hubs with only limited and local information, we can adaptively generate caches and replicas in dynamic peer-to-peer networks.

Low Power Trace Cache for Embedded Processor

  • Moon Je-Gil;Jeong Ha-Young;Lee Yong-Surk
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.204-208
    • /
    • 2004
  • Embedded business will be expanded market more and more since customers seek more wearable and ubiquitous systems. Cellular telephones, PDAs, notebooks and portable multimedia devices could bring higher microprocessor revenues and more rewarding improvements in performance and functions. Increasing battery capacity is still creeping along the roadmap. Until a small practical fuel cell becomes available, microprocessor developers must come up with power-reduction methods. According to MPR 2003, the instruction and data caches of ARM920T processor consume $44\%$ of total processor power. The rest of it is split into the power consumptions of the integer core, memory management units, bus interface unit and other essential CPU circuitry. And the relationships among CPU, peripherals and caches may change in the future. The processor working on higher operating frequency will exact larger cache RAM and consume more energy. In this paper, we propose advanced low power trace cache which caches traces of the dynamic instruction stream, and reduces cache access times. And we evaluate the performance of the trace cache and estimate the power of the trace cache, which is compared with conventional cache.

  • PDF