• Title/Summary/Keyword: Set-associative Cache

Search Result 26, Processing Time 0.024 seconds

Performance Analysis of n-way Associative Cache and Fully Associative Cache (n-way Set Associative Cache와 Fully Associative Cache성능 분석)

  • Jo, Yong-Hun;Kim, Jeong-Seon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.3
    • /
    • pp.802-810
    • /
    • 1997
  • In this paper, the performance of direce mapping caches, 2_, 4_, 8_, .., 4096_way way set associative caches, and fully assiciative caches are analyized by trace simulation for verivying their effectiveness.In general, it is well known that as n, the number of main memory lines to be stored into one cache line number in direct mapping cache, increases, the performance of the cache memory should get higher linearly.According to our analysis, however, it is not true on all the cache organizations.It is shown that as n increases, miss ratios get lower only when the small cache(less than 256K) using large line size is used.It is also shown that fully associative mapping achieves high performance only when small size cache using large line size ia used.

  • PDF

Effective Algorithm for the Low-Power Set-Associative Cache Memory (저전력 집합연관 캐시를 위한 효과적인 알고리즘)

  • Jung, Bo-Sung;Lee, Jung-Hoon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.9 no.1
    • /
    • pp.25-32
    • /
    • 2014
  • In this paper, we proposed a partial-way set associative cache memory with an effective memory access time and low energy consumption. In the proposed set-associative cache memory, it is allowed to access only a 2-ways among 4-way at a time. Choosing ways to be accessed is made dynamically via the least significant two bits of the tag. The chosen 2 ways are sequentially accessed by the way selection bits that indicate the most recently referred way. Therefore, each entry in the way has an additional bit, that is, the way selection bit. In addition, instead of the 4-way LRU or FIFO algorithm, we can utilize a simple 2-way replacement policy. Simulation results show that the energy*delay product can be reduced by about 78%, 14%, 39%, and 15% compared with a 4-way set associative cache, a sequential-way cache, a way-tracking cache, and a way cache respectively.

Cache Architecture Design for the Performance Improvement of OpenRISC Core (OpenRISC 코어의 성능향상을 위한 캐쉬 구조 설계)

  • Jung, Hong-Kyun;Ryoo, Kwang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.1
    • /
    • pp.68-75
    • /
    • 2009
  • As the recent performance of microprocessor is improving quickly, the necessity of cache is growing because of the increase of the access time of main memory. Every block of direct-mapped cache maps to one cache line. Although the mapping rule is simple, if different blocks map to one cache line, the miss ratio will be higher than the set-associative cache due to conflicts. In this paper, for the improvement of the direct-mapped cache of OpenRISC, 4-way set-associative cache is proposed. Four blocks of the main memory of the proposed cache map to one cache line so that the miss ratio is less than the direct-mapped cache. Pseudo-LRU Policy, which is one of the Line Replacement Policies, is used for decreasing the number of bits that store LRU value. The OpenRISC core including the 4-way set-associative cache was verified with FPGA emulation. As the result of performance measurement using test program, the performance of the OpenRISC core including the 4-way set-associative cache is higher than the previous one by 50% and the decrease of miss ratio is more than 15%.

Instruction Flow based Early Way Determination Technique for Low-power L1 Instruction Cache

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.9
    • /
    • pp.1-9
    • /
    • 2016
  • Recent embedded processors employ set-associative L1 instruction cache to improve the performance. The energy consumption in the set-associative L1 instruction cache accounts for considerable portion in the embedded processor. When an instruction is required from the processor, all ways in the set-associative instruction cache are accessed in parallel. In this paper, we propose the technique to reduce the energy consumption in the set-associative L1 instruction cache effectively by accessing only one way. Gshare branch predictor is employed to predict the instruction flow and determine the way to fetch the instruction. When the branch prediction is untaken, next instruction in a sequential order can be fetched from the instruction cache by accessing only one way. According to our simulations with SPEC2006 benchmarks, the proposed technique requires negligible hardware overhead and shows 20% energy reduction on average in 4-way L1 instruction cache.

Design of Cache Memory System for Next Generation CPU (차세대 CPU를 위한 캐시 메모리 시스템 설계)

  • Jo, Ok-Rae;Lee, Jung-Hoon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.6
    • /
    • pp.353-359
    • /
    • 2016
  • In this paper, we propose a high performance L1 cache structure for the high clock CPU. The proposed cache memory consists of three parts, i.e., a direct-mapped cache to support fast access time, a two-way set associative buffer to reduce miss ratio, and a way-select table. The most recently accessed data is stored in the direct-mapped cache. If a data has a high probability of a repeated reference, when the data is replaced from the direct-mapped cache, the data is stored into the two-way set associative buffer. For the high performance and fast access time, we propose an one way among two ways set associative buffer is selectively accessed based on the way-select table (WST). According to simulation results, access time can be reduced by about 7% and 40% comparing with a direct cache and Intel i7-6700 with two times more space respectively.

Energy-efficient Set-associative Cache Using Bi-mode Way-selector (에너지 효율이 높은 이중웨이선택형 연관사상캐시)

  • Lee, Sungjae;Kang, Jinku;Lee, Juho;Youn, Jiyong;Lee, Inhwan
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.1 no.1
    • /
    • pp.1-10
    • /
    • 2012
  • The way-lookup cache and the way-tracking cache are considered to be the most energy-efficient when used for level 1 and level 2 caches, respectively. This paper proposes an energy-efficient set-associative cache using the bi-mode way-selector that combines the way selecting techniques of the way-tracking cache and the way-lookup cache. The simulation results using an Alpha 21264-based system show that the bi-mode way-selecting L1 instruction cache consumes 27.57% of the energy consumed by the conventional set-associative cache and that it is as energy-efficient as the way-lookup cache when used for L1 instruction cache. The bi-mode way-selecting L1 data cache consumes 28.42% of the energy consumed by the conventional set-associative cache, which means that it is more energy-efficient than the way-lookup cache by 15.54% when used for L1 data cache. The bi-mode way-selecting L2 cache consumes 15.41% of the energy consumed by the conventional set-associative cache, which means that it is more energy-efficient than the way-tracking cache by 16.16% when used for unified L2 cache. These results show that the proposed cache can provide the best level of energy-efficiency regardless of the cache level.

High Performance Data Cache Memory Architecture (고성능 데이터 캐시 메모리 구조)

  • Kim, Hong-Sik;Kim, Cheong-Ghil
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.9 no.4
    • /
    • pp.945-951
    • /
    • 2008
  • In this paper, a new high performance data cache scheme that improves exploitation of both the spatial and temporal locality is proposed. The proposed data cache consists of a hardware prefetch unit and two sub-caches such as a direct-mapped (DM) cache with a large block size and a fully associative buffer with a small block size. Spatial locality is exploited by fetching and storing large blocks into a direct mapped cache, and is enhanced by prefetching a neighboring block when a DM cache hit occurs. Temporal locality is exploited by storing small blocks from the DM cache in the fully associative buffer according to their activity in the DM cache when they are replaced. Experimental results on Spec2000 programs show that the proposed scheme can reduce the average miss ratio by $12.53%\sim23.62%$ and the AMAT by $14.67%\sim18.60%$ compared to the previous schemes such as direct mapped cache, 4-way set associative cache and SMI(selective mode intelligent) cache[8].

Cache memory system for high performance CPU with 4GHz (4Ghz 고성능 CPU 위한 캐시 메모리 시스템)

  • Jung, Bo-Sung;Lee, Jung-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.2
    • /
    • pp.1-8
    • /
    • 2013
  • TIn this paper, we propose a high performance L1 cache structure on the high clock CPU of 4GHz. The proposed cache memory consists of three parts, i.e., a direct-mapped cache to support fast access time, a two-way set associative buffer to exploit temporal locality, and a buffer-select table. The most recently accessed data is stored in the direct-mapped cache. If a data has a high probability of a repeated reference, when the data is replaced from the direct-mapped cache, the data is selectively stored into the two-way set associative buffer. For the high performance and low power consumption, we propose an one way among two ways set associative buffer is selectively accessed based on the buffer-select table(BST). According to simulation results, Energy $^*$ Delay product can improve about 45%, 70% and 75% compared with a direct mapped cache, a four-way set associative cache, and a victim cache with two times more space respectively.

Performance Improvement and Power Consumption Reduction of an Embedded RISC Core

  • Jung, Hong-Kyun;Jin, Xianzhe;Ryoo, Kwang-Ki
    • Journal of information and communication convergence engineering
    • /
    • v.10 no.1
    • /
    • pp.78-84
    • /
    • 2012
  • This paper presents a branch prediction algorithm and a 4-way set-associative cache for performance improvement of an embedded RISC core and a clock-gating algorithm with observability don’t care (ODC) operation to reduce the power consumption of the core. The branch prediction algorithm has a structure using a branch target buffer (BTB) and 4-way set associative cache that has a lower miss rate than a direct-mapped cache. Pseudo-least recently used (LRU) policy is used for reducing the number of LRU bits. The clock-gating algorithm reduces dynamic power consumption. As a result of estimation of the performance and the dynamic power, the performance of the OpenRISC core applied to the proposed architecture is improved about 29% and the dynamic power of the core with the Chartered 0.18 ${\mu}m$ technology library is reduced by 16%.

Bounding Worst-Case Data Cache Performance by Using Stack Distance

  • Liu, Yu;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.4
    • /
    • pp.195-215
    • /
    • 2009
  • Worst-case execution time (WCET) analysis is critical for hard real-time systems to ensure that different tasks can meet their respective deadlines. While significant progress has been made for WCET analysis of instruction caches, the data cache timing analysis, especially for set-associative data caches, is rather limited. This paper proposes an approach to safely and tightly bounding data cache performance by computing the worst-case stack distance of data cache accesses. Our approach can not only be applied to direct-mapped caches, but also be used for set-associative or even fully-associative caches without increasing the complexity of analysis. Moreover, the proposed approach can statically categorize worst-case data cache misses into cold, conflict, and capacity misses, which can provide useful insights for designers to enhance the worst-case data cache performance. Our evaluation shows that the proposed data cache timing analysis technique can safely and accurately estimate the worst-case data cache performance, and the overestimation as compared to the observed worst-case data cache misses is within 1% on average.