• Title/Summary/Keyword: Data cache

Search Result 487, Processing Time 0.024 seconds

Variable latency L1 data cache architecture design in multi-core processor under process variation

  • Kong, Joonho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.9
    • /
    • pp.1-10
    • /
    • 2015
  • In this paper, we propose a new variable latency L1 data cache architecture for multi-core processors. Our proposed architecture extends the traditional variable latency cache to be geared toward the multi-core processors. We added a specialized data structure for recording the latency of the L1 data cache. Depending on the added latency to the L1 data cache, the value stored to the data structure is determined. It also tracks the remaining cycles of the L1 data cache which notifies data arrival to the reservation station in the core. As in the variable latency cache of the single-core architecture, our proposed architecture flexibly extends the cache access cycles considering process variation. The proposed cache architecture can reduce yield losses incurred by L1 cache access time failures to nearly 0%. Moreover, we quantitatively evaluate performance, power, energy consumption, power-delay product, and energy-delay product when increasing the number of cache access cycles.

A Locality-Aware Write Filter Cache for Energy Reduction of STTRAM-Based L1 Data Cache

  • Kong, Joonho
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.1
    • /
    • pp.80-90
    • /
    • 2016
  • Thanks to superior leakage energy efficiency compared to SRAM cells, STTRAM cells are considered as a promising alternative for a memory element in on-chip caches. However, the main disadvantage of STTRAM cells is high write energy and latency. In this paper, we propose a low-cost write filter (WF) cache which resides between the load/store queue and STTRAM-based L1 data cache. To maximize efficiency of the WF cache, the line allocation and access policies are optimized for reducing energy consumption of STTRAM-based L1 data cache. By efficiently filtering the write operations in the STTRAM-based L1 data cache, our proposed WF cache reduces energy consumption of the STTRAM-based L1 data cache by up to 43.0% compared to the case without the WF cache. In addition, thanks to the fast hit latency of the WF cache, it slightly improves performance by 0.2%.

A cache placement algorithm based on comprehensive utility in big data multi-access edge computing

  • Liu, Yanpei;Huang, Wei;Han, Li;Wang, Liping
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.3892-3912
    • /
    • 2021
  • The recent rapid growth of mobile network traffic places multi-access edge computing in an important position to reduce network load and improve network capacity and service quality. Contrasting with traditional mobile cloud computing, multi-access edge computing includes a base station cooperative cache layer and user cooperative cache layer. Selecting the most appropriate cache content according to actual needs and determining the most appropriate location to optimize the cache performance have emerged as serious issues in multi-access edge computing that must be solved urgently. For this reason, a cache placement algorithm based on comprehensive utility in big data multi-access edge computing (CPBCU) is proposed in this work. Firstly, the cache value generated by cache placement is calculated using the cache capacity, data popularity, and node replacement rate. Secondly, the cache placement problem is then modeled according to the cache value, data object acquisition, and replacement cost. The cache placement model is then transformed into a combinatorial optimization problem and the cache objects are placed on the appropriate data nodes using tabu search algorithm. Finally, to verify the feasibility and effectiveness of the algorithm, a multi-access edge computing experimental environment is built. Experimental results show that CPBCU provides a significant improvement in cache service rate, data response time, and replacement number compared with other cache placement algorithms.

Preventing Fast Wear-out of Flash Cache with An Admission Control Policy

  • Lee, Eunji;Bahn, Hyokyung
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.5
    • /
    • pp.546-553
    • /
    • 2015
  • Recently, flash cache is widely adopted as the performance accelerator of legacy storage systems. Unlike other cache media, flash cache should be carefully managed as it has peculiar characteristics such as long write latency and limited P/E cycles. In particular, we make two prominent observations that can be utilized in managing flash cache. First, a serious worn-out problem happens when the working-set of a system is beyond the capacity of flash cache due to excessively frequent cache replacement. Second, more than 50% of data has no hit in flash cache as it is a second level cache. Based on these observations, we propose a cache admission control policy that does not cache data when it is first accessed, and inserts it into the cache only after its second access occurs within a certain time window. This allows the filtering of data disruptive to flash cache in terms of endurance and performance. With this policy, we prolong the lifetime of flash cache 2.3 times without any performance degradations.

Design and analytical evaluation of a fuzzy proxy caching for wireless internet

  • Bae, Ihn-Han
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.6
    • /
    • pp.1177-1190
    • /
    • 2009
  • In this paper, we propose a fuzzy proxy cache scheme for caching web documents in mobile base stations. In this scheme, a mobile cache model is used to facilitate data caching and data replication. Using the proposed cache scheme, the individual proxy in the base station makes cache decisions based solely on its local knowledge of the global cache state so that the entire wireless proxy cache system can be effectively managed without centralized control. To improve the performance of proxy caching, the proposed cache scheme predicts the direction of movement of mobile hosts, and uses various cache methods for neighboring proxy servers according to the fuzzy-logic-based control rules based on the membership degree of the mobile host. The performance of our cache scheme is evaluated analytically in terms of average response delay and average energy cost, and is compared with that of other mobile cache schemes.

  • PDF

Energy-Performance Efficient 2-Level Data Cache Architecture for Embedded System (내장형 시스템을 위한 에너지-성능 측면에서 효율적인 2-레벨 데이터 캐쉬 구조의 설계)

  • Lee, Jong-Min;Kim, Soon-Tae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.5
    • /
    • pp.292-303
    • /
    • 2010
  • On-chip cache memories play an important role in both performance and energy consumption points of view in resource-constrained embedded systems by filtering many off-chip memory accesses. We propose a 2-level data cache architecture with a low energy-delay product tailored for the embedded systems. The L1 data cache is small and direct-mapped, and employs a write-through policy. In contrast, the L2 data cache is set-associative and adopts a write-back policy. Consequently, the L1 data cache is accessed in one cycle and is able to provide high cache bandwidth while the L2 data cache is effective in reducing global miss rate. To reduce the penalty of high miss rate caused by the small L1 cache and power consumption of address generation, we propose an ECP(Early Cache hit Predictor) scheme. The ECP predicts if the L1 cache has the requested data using both fast address generation and L1 cache hit prediction. To reduce high energy cost of accessing the L2 data cache due to heavy write-through traffic from the write buffer laid between the two cache levels, we propose a one-way write scheme. From our simulation-based experiments using a cycle-accurate simulator and embedded benchmarks, the proposed 2-level data cache architecture shows average 3.6% and 50% improvements in overall system performance and the data cache energy consumption.

Enhancing GPU Performance by Efficient Hardware-Based and Hybrid L1 Data Cache Bypassing

  • Huangfu, Yijie;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.2
    • /
    • pp.69-77
    • /
    • 2017
  • Recent GPUs have adopted cache memory to benefit general-purpose GPU (GPGPU) programs. However, unlike CPU programs, GPGPU programs typically have considerably less temporal/spatial locality. Moreover, the L1 data cache is used by many threads that access a data size typically considerably larger than the L1 cache, making it critical to bypass L1 data cache intelligently to enhance GPU cache performance. In this paper, we examine GPU cache access behavior and propose a simple hardware-based GPU cache bypassing method that can be applied to GPU applications without recompiling programs. Moreover, we introduce a hybrid method that integrates static profiling information and hardware-based bypassing to further enhance performance. Our experimental results reveal that hardware-based cache bypassing can boost performance for most benchmarks, and the hybrid method can achieve performance comparable to state-of-the-art compiler-based bypassing with considerably less profiling cost.

Bounding Worst-Case Data Cache Performance by Using Stack Distance

  • Liu, Yu;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.4
    • /
    • pp.195-215
    • /
    • 2009
  • Worst-case execution time (WCET) analysis is critical for hard real-time systems to ensure that different tasks can meet their respective deadlines. While significant progress has been made for WCET analysis of instruction caches, the data cache timing analysis, especially for set-associative data caches, is rather limited. This paper proposes an approach to safely and tightly bounding data cache performance by computing the worst-case stack distance of data cache accesses. Our approach can not only be applied to direct-mapped caches, but also be used for set-associative or even fully-associative caches without increasing the complexity of analysis. Moreover, the proposed approach can statically categorize worst-case data cache misses into cold, conflict, and capacity misses, which can provide useful insights for designers to enhance the worst-case data cache performance. Our evaluation shows that the proposed data cache timing analysis technique can safely and accurately estimate the worst-case data cache performance, and the overestimation as compared to the observed worst-case data cache misses is within 1% on average.

Impacts of multiple cache block sizes on system performance (다양한 cache block크기에 의한 시스템의 성능 변화)

  • 이성환;김준성
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1347-1350
    • /
    • 2003
  • 본 논문에서는 instruction과 data cache로 나누어지는 L1 cache를 가진 시스템에서 instruction과 data cache 각각의 block 크기 변화가 전체 시스템의 성능에 미치는 영향을 고찰하였다. 이를 위하여 SPEC CPU 벤치마크 프로그램을 입력으로 하는 SimpleScalar를 이용한 시뮬레이션을 수행하였다. 본 연구를 통해서, instruction과 data 각각의 특성에 맞는 cache block 크기를 사용하는 것이 일률적인 cache block 크기를 사용하는 것에 비하여 전체 시스템의 성능을 더욱 향상시켜 준다는 것을 보여준다.

  • PDF

WWW Cache Replacement Algorithm Based on the Network-distance

  • Kamizato, Masaru;Nagata, Tomokazu;Taniguchi, Yuji;Tamaki, Shiro
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.238-241
    • /
    • 2002
  • With the popularity of utilization of the Internet among people, the amount of data in the network rapidly increased. So that, the fall of response time from WWW server, which is caused by the network traffic and the burden on m server, has become more of an issue. This problem is encouraged the rearch by redundancy of requesting the same pages by many people, even though they browse the same the ones. To reduce these redundancy, WWW cache server is used commonly in order to store m page data and reuse them. However, the technical uses of WWW cache that different from CPU and Disk cache, is known for its difficulty of improving the cache hit rate. Consecuently, it is difficult to choose effective WWW data to be stored from all data flowing through the WWW cache server. On the other hand, there are room for improvement in commonly used cache replacement algorithms by WWW cache server. In our study, we try to realize a WWW cache server that stresses on the improvement of the stresses of response time. To this end, we propose the new cache replacement algorithm by focusing on the utilizable information of network distance from the WWW cache server to WWW server that possessing the page data of the user requesting.

  • PDF