Search | Korea Science

A Locality-Aware Write Filter Cache for Energy Reduction of STTRAM-Based L1 Data Cache

Kong, Joonho
- JSTS:Journal of Semiconductor Technology and Science
- /
- v.16 no.1
- /
- pp.80-90
- /
- 2016
Thanks to superior leakage energy efficiency compared to SRAM cells, STTRAM cells are considered as a promising alternative for a memory element in on-chip caches. However, the main disadvantage of STTRAM cells is high write energy and latency. In this paper, we propose a low-cost write filter (WF) cache which resides between the load/store queue and STTRAM-based L1 data cache. To maximize efficiency of the WF cache, the line allocation and access policies are optimized for reducing energy consumption of STTRAM-based L1 data cache. By efficiently filtering the write operations in the STTRAM-based L1 data cache, our proposed WF cache reduces energy consumption of the STTRAM-based L1 data cache by up to 43.0% compared to the case without the WF cache. In addition, thanks to the fast hit latency of the WF cache, it slightly improves performance by 0.2%.
https://doi.org/10.5573/JSTS.2016.16.1.080 인용 PDF KSCI

Energy-Performance Efficient 2-Level Data Cache Architecture for Embedded System (내장형 시스템을 위한 에너지-성능 측면에서 효율적인 2-레벨 데이터 캐쉬 구조의 설계)

Lee, Jong-Min;Kim, Soon-Tae
- Journal of KIISE:Computer Systems and Theory
- /
- v.37 no.5
- /
- pp.292-303
- /
- 2010
On-chip cache memories play an important role in both performance and energy consumption points of view in resource-constrained embedded systems by filtering many off-chip memory accesses. We propose a 2-level data cache architecture with a low energy-delay product tailored for the embedded systems. The L1 data cache is small and direct-mapped, and employs a write-through policy. In contrast, the L2 data cache is set-associative and adopts a write-back policy. Consequently, the L1 data cache is accessed in one cycle and is able to provide high cache bandwidth while the L2 data cache is effective in reducing global miss rate. To reduce the penalty of high miss rate caused by the small L1 cache and power consumption of address generation, we propose an ECP(Early Cache hit Predictor) scheme. The ECP predicts if the L1 cache has the requested data using both fast address generation and L1 cache hit prediction. To reduce high energy cost of accessing the L2 data cache due to heavy write-through traffic from the write buffer laid between the two cache levels, we propose a one-way write scheme. From our simulation-based experiments using a cycle-accurate simulator and embedded benchmarks, the proposed 2-level data cache architecture shows average 3.6% and 50% improvements in overall system performance and the data cache energy consumption.
PDF KSCI

A Study on Design and Cache Replacement Policy for Cascaded Cache Based on Non-Volatile Memories (비휘발성 메모리 시스템을 위한 저전력 연쇄 캐시 구조 및 최적화된 캐시 교체 정책에 대한 연구)

Juhee Choi
- Journal of the Semiconductor & Display Technology
- /
- v.22 no.3
- /
- pp.106-111
- /
- 2023
The importance of load-to-use latency has been highlighted as state-of-the-art computing cores adopt deep pipelines and high clock frequencies. The cascaded cache was recently proposed to reduce the access cycle of the L1 cache by utilizing differences in latencies among banks of the cache structure. However, this study assumes the cache is comprised of SRAM, making it unsuitable for direct application to non-volatile memory-based systems. This paper proposes a novel mechanism and structure for lowering dynamic energy consumption. It inserts monitoring logic to keep track of swap operations and write counts. If the ratio of swap operations to total write counts surpasses a set threshold, the cache controller skips the swap of cache blocks, which leads to reducing write operations. To validate this approach, experiments are conducted on the non-volatile memory-based cascaded cache. The results show a reduction in write operations by an average of 16.7% with a negligible increase in latencies.
PDF

Design of an Asynchronous Data Cache with FIFO Buffer for Write Back Mode (Write Back 모드용 FIFO 버퍼 기능을 갖는 비동기식 데이터 캐시)

Park, Jong-Min;Kim, Seok-Man;Oh, Myeong-Hoon;Cho, Kyoung-Rok
- The Journal of the Korea Contents Association
- /
- v.10 no.6
- /
- pp.72-79
- /
- 2010
In this paper, we propose the data cache architecture with a write buffer for a 32bit asynchronous embedded processor. The data cache consists of CAM and data memory. It accelerates data up lood cycle between the processor and the main memory that improves processor performance. The proposed data cache has 8 KB cache memory. The cache uses the 4-way set associative mapping with line size of 4 words (16 bytes) and pseudo LRU replacement algorithm for data replacement in the memory. Dirty register and write buffer is used for write policy of the cache. The designed data cache is synthesized to a gate level design using $0.13-{\mu}m$ process. Its average hit rate is 94%. And the system performance has been improved by 46.53%. The proposed data cache with write buffer is very suitable for a 32-bit asynchronous processor.
https://doi.org/10.5392/JKCA.2010.10.6.072 인용 PDF KSCI

The Early Write Back Scheme For Write-Back Cache (라이트 백 캐쉬를 위한 빠른 라이트 백 기법)

Chung, Young-Jin;Lee, Kil-Whan;Lee, Yong-Surk
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.46 no.11
- /
- pp.101-109
- /
- 2009
Generally, depth cache and pixel cache of 3D graphics are designed by using write-back scheme for efficient use of memory bandwidth. Also, there are write after read operations of same address or only write operations are occurred frequently in 3D graphics cache. If a cache miss is detected, an access to the external memory for write back operation and another access to the memory for handling the cache miss are operated simultaneously. So on frequent cache miss situations, as the memory access bandwidth limited, the access time of the external memory will be increased due to memory bottleneck problem. As a result, the total performance of the processor or the IP will be decreased, also the problem will increase peak power consumption. So in this paper, we proposed a novel early write back cache architecture so as to solve the problems issued above. The proposed architecture controls the point when to access the external memory as to copy the valid data block. And this architecture can improve the cache performance with same hit ratio and same capacity cache. As a result, the proposed architecture can solve the memory bottleneck problem by preventing intensive memory accesses. We have evaluated the new proposed architecture on 3D graphics z cache and pixel cache on a SoC environment where ARM11, 3D graphic accelerator and various IPs are embedded. The simulation results indicated that there were maximum 75% of performance increase when using various simulation vectors.
PDF KSCI

A Study on Direct Cache-to-Cache Transfer for Hybrid Cache Architecture to Reduce Write Operations (쓰기 횟수 감소를 위한 하이브리드 캐시 구조에서의 캐시간 직접 전송 기법에 대한 연구)

Juhee Choi
- Journal of the Semiconductor & Display Technology
- /
- v.23 no.1
- /
- pp.65-70
- /
- 2024
Direct cache-to-cache transfer has been studied to reduce the latency and bandwidth consumption related to the shared data in multiprocessor system. Even though these studies lead to meaningful results, they assume that caches consist of SRAM. For example, if the system employs the non-volatile memory, the one of the most important parts to consider is to decrease the number of write operations. This paper proposes a hybrid write avoidance cache coherence protocol that considers the hybrid cache architecture. A new state is added to finely control what is stored in the non-volatile memory area, and experimental results showed that the number of writes was reduced by about 36% compared to the existing schemes.
PDF

Improving Energy Efficiency and Lifetime of Phase Change Memory using Delta Value Indicator

Choi, Ju Hee;Kwak, Jong Wook
- JSTS:Journal of Semiconductor Technology and Science
- /
- v.16 no.3
- /
- pp.330-338
- /
- 2016
Phase change memory (PCM) has been studied as an emerging memory technology for last-level cache (LLC) due to its extremely low leakage. However, it consumes high levels of energy in updating cells and its write endurance is limited. To relieve the write pressure of LLC, we propose a delta value indicator (DVI) by employing a small cache which stores the difference between the value currently stored and the value newly loaded. Since the write energy consumption of the small cache is less than the LLC, the energy consumption is reduced by access to the small cache instead of the LLC. In addition, the lifetime of the LLC is further extended because the number of write accesses to the LLC is decreased. To this end, a delta value indicator and controlling circuits are inserted into the LLC. The simulation results show a 26.8% saving of dynamic energy consumption and a 31.7% lifetime extension compared to a state-of-the-art scheme for PCM.
https://doi.org/10.5573/JSTS.2016.16.3.330 인용 PDF KSCI

MI-MESI Write-invalidate Snooping Cache Coherence Protocol (MI-MESI 쓰기-무효화 스누핑 캐쉬 일관성 유지 프로토콜)

Jang, Seong-Tae
- The Transactions of the Korea Information Processing Society
- /
- v.2 no.5
- /
- pp.757-767
- /
- 1995
In this paper, we present MI-MESI write-invalidate snooping cache coherence protocol which addresses several significant drawbacks of MESI and MI-MESI write -invalidate snooping cache coherence protocols under the split transaction bus based multiprocessor environment. In this protocol, each cache block maintains one of six cache states which represent Modified-shared, Invalid-by-other, Modified, Exclusive, Shared and Invalid states. By using these cache states, our protocol reduces both the access contention and unnecessary updates for the memory modules significantly, and thus providing the fast memory access time.
PDF

A Proposal for Hit Ratio Improvement of a Microprocessor's Cache Memory (마이크로프로세서 캐쉬메모리의 적중률 개선을 위한 제안)

조용훈;김정선
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.25 no.4B
- /
- pp.783-787
- /
- 2000
A microprocessor, which is used as a CPU for state-of-the-art personal computers, adopts 256KB or 512KB L2(Level 2) cache memory. This cache hires Direct Mapping Procedure, 32B Line Size, and no Write Allocation. In this cache architecture, we can expert about 2.5% hit ratio improvement by using 8-way Set Associative Mapping instead of Direct Mapping, 128B Line Size instead of 32B, and Write Allocation.
PDF

An Efficient Variable Rearrangement Technique for STT-RAM Based Hybrid Caches

Youn, Jonghee M.;Cho, Doosan
- IEMEK Journal of Embedded Systems and Applications
- /
- v.11 no.2
- /
- pp.67-78
- /
- 2016
The emerging Spin-Transfer Torque RAM (STT-RAM) is a promising component that can be used to improve the efficiency as a result of its high storage density and low leakage power. However, the state-of-the-art STT-RAM is not ready to replace SRAM technology due to the negative effect of its write operations. The write operations require longer latency and more power than the same operations in SRAM. Therefore, a hybrid cache with SRAM and STT-RAM technologies is proposed to obtain the benefits of STT-RAM while minimizing its negative effects by using SRAM. To efficiently use of the hybrid cache, it is important to place write intensive data onto the cache. Such data should be placed on SRAM to minimize the negative effect. Thus, we propose a technique that optimizes placement of data in main memory. It drives the proper combination of advantages and disadvantages for SRAM and STT-RAM in the hybrid cache. As a result of the proposed technique, write intensive data are loaded to SRAM and read intensive data are loaded to STT-RAM. In addition, our technique also optimizes temporal locality to minimize conflict misses. Therefore, it improves performance and energy consumption of the hybrid cache architecture in a certain range.
https://doi.org/10.14372/IEMEK.2016.11.2.67 인용 PDF KSCI

Search Result 89, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)