• Title/Summary/Keyword: 2-level data cache

Search Result 24, Processing Time 0.022 seconds

Energy-Performance Efficient 2-Level Data Cache Architecture for Embedded System (내장형 시스템을 위한 에너지-성능 측면에서 효율적인 2-레벨 데이터 캐쉬 구조의 설계)

  • Lee, Jong-Min;Kim, Soon-Tae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.5
    • /
    • pp.292-303
    • /
    • 2010
  • On-chip cache memories play an important role in both performance and energy consumption points of view in resource-constrained embedded systems by filtering many off-chip memory accesses. We propose a 2-level data cache architecture with a low energy-delay product tailored for the embedded systems. The L1 data cache is small and direct-mapped, and employs a write-through policy. In contrast, the L2 data cache is set-associative and adopts a write-back policy. Consequently, the L1 data cache is accessed in one cycle and is able to provide high cache bandwidth while the L2 data cache is effective in reducing global miss rate. To reduce the penalty of high miss rate caused by the small L1 cache and power consumption of address generation, we propose an ECP(Early Cache hit Predictor) scheme. The ECP predicts if the L1 cache has the requested data using both fast address generation and L1 cache hit prediction. To reduce high energy cost of accessing the L2 data cache due to heavy write-through traffic from the write buffer laid between the two cache levels, we propose a one-way write scheme. From our simulation-based experiments using a cycle-accurate simulator and embedded benchmarks, the proposed 2-level data cache architecture shows average 3.6% and 50% improvements in overall system performance and the data cache energy consumption.

New Two-Level L1 Data Cache Bypassing Technique for High Performance GPUs

  • Kim, Gwang Bok;Kim, Cheol Hong
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.51-62
    • /
    • 2021
  • On-chip caches of graphics processing units (GPUs) have contributed to improved GPU performance by reducing long memory access latency. However, cache efficiency remains low despite the facts that recent GPUs have considerably mitigated the bottleneck problem of L1 data cache. Although the cache miss rate is a reasonable metric for cache efficiency, it is not necessarily proportional to GPU performance. In this study, we introduce a second key determinant to overcome the problem of predicting the performance gains from L1 data cache based on the assumption that miss rate only is not accurate. The proposed technique estimates the benefits of the cache by measuring the balance between cache efficiency and throughput. The throughput of the cache is predicted based on the warp occupancy information in the warp pool. Then, the warp occupancy is used for a second bypass phase when workloads show an ambiguous miss rate. In our proposed architecture, the L1 data cache is turned off for a long period when the warp occupancy is not high. Our two-level bypassing technique can be applied to recent GPU models and improves the performance by 6% on average compared to the architecture without bypassing. Moreover, it outperforms the conventional bottleneck-based bypassing techniques.

Preventing Fast Wear-out of Flash Cache with An Admission Control Policy

  • Lee, Eunji;Bahn, Hyokyung
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.5
    • /
    • pp.546-553
    • /
    • 2015
  • Recently, flash cache is widely adopted as the performance accelerator of legacy storage systems. Unlike other cache media, flash cache should be carefully managed as it has peculiar characteristics such as long write latency and limited P/E cycles. In particular, we make two prominent observations that can be utilized in managing flash cache. First, a serious worn-out problem happens when the working-set of a system is beyond the capacity of flash cache due to excessively frequent cache replacement. Second, more than 50% of data has no hit in flash cache as it is a second level cache. Based on these observations, we propose a cache admission control policy that does not cache data when it is first accessed, and inserts it into the cache only after its second access occurs within a certain time window. This allows the filtering of data disruptive to flash cache in terms of endurance and performance. With this policy, we prolong the lifetime of flash cache 2.3 times without any performance degradations.

Energy-efficient Set-associative Cache Using Bi-mode Way-selector (에너지 효율이 높은 이중웨이선택형 연관사상캐시)

  • Lee, Sungjae;Kang, Jinku;Lee, Juho;Youn, Jiyong;Lee, Inhwan
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.1 no.1
    • /
    • pp.1-10
    • /
    • 2012
  • The way-lookup cache and the way-tracking cache are considered to be the most energy-efficient when used for level 1 and level 2 caches, respectively. This paper proposes an energy-efficient set-associative cache using the bi-mode way-selector that combines the way selecting techniques of the way-tracking cache and the way-lookup cache. The simulation results using an Alpha 21264-based system show that the bi-mode way-selecting L1 instruction cache consumes 27.57% of the energy consumed by the conventional set-associative cache and that it is as energy-efficient as the way-lookup cache when used for L1 instruction cache. The bi-mode way-selecting L1 data cache consumes 28.42% of the energy consumed by the conventional set-associative cache, which means that it is more energy-efficient than the way-lookup cache by 15.54% when used for L1 data cache. The bi-mode way-selecting L2 cache consumes 15.41% of the energy consumed by the conventional set-associative cache, which means that it is more energy-efficient than the way-tracking cache by 16.16% when used for unified L2 cache. These results show that the proposed cache can provide the best level of energy-efficiency regardless of the cache level.

Effect of ASLR on Memory Duplicate Ratio in Cache-based Virtual Machine Live Migration

  • Piao, Guangyong;Oh, Youngsup;Sung, Baegjae;Park, Chanik
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.9 no.4
    • /
    • pp.205-210
    • /
    • 2014
  • Cache based live migration method utilizes a cache, which is accessible to both side (remote and local), to reduce the virtual machine migration time, by transferring only irredundant data. However, address space layout randomization (ASLR) is proved to reduce the memory duplicate ratio between targeted migration memory and the migration cache. In this pager, we analyzed the behavior of ASLR to find out how it changes the physical memory contents of virtual machines. We found that among six virtual memory regions, only the modification to stack influences the page-level memory duplicate ratio. Experiments showed that: (1) the ASLR does not shift the heap region in sub-page level; (2) the stack reduces the duplicate page size among VMs which performed input replay around 40MB, when ASLR was enabled; (3) the size of memory pages, which can be reconstructed from the fresh booted up state, also reduces by about 60MB by ASLR. With those observations, when applying cache-based migration method, we can omit the stack region. While for other five regions, even a coarse page-level redundancy data detecting method can figure out most of the duplicate memory contents.

Program Cache Busy Time Control Method for Reducing Peak Current Consumption of NAND Flash Memory in SSD Applications

  • Park, Se-Chun;Kim, You-Sung;Cho, Ho-Youb;Choi, Sung-Dae;Yoon, Mi-Sun;Kim, Tae-Yun;Park, Kun-Woo;Park, Jongsun;Kim, Soo-Won
    • ETRI Journal
    • /
    • v.36 no.5
    • /
    • pp.876-879
    • /
    • 2014
  • In current NAND flash design, one of the most challenging issues is reducing peak current consumption (peak ICC), as it leads to peak power drop, which can cause malfunctions in NAND flash memory. This paper presents an efficient approach for reducing the peak ICC of the cache program in NAND flash memory - namely, a program Cache Busy Time (tPCBSY) control method. The proposed tPCBSY control method is based on the interesting observation that the array program current (ICC2) is mainly decided by the bit-line bias condition. In the proposed approach, when peak ICC2 becomes larger than a threshold value, which is determined by a cache loop number, cache data cannot be loaded to the cache buffer (CB). On the other hand, when peak ICC2 is smaller than the threshold level, cache data can be loaded to the CB. As a result, the peak ICC of the cache program is reduced by 32% at the least significant bit page and by 15% at the most significant bit page. In addition, the program throughput reaches 20 MB/s in multiplane cache program operation, without restrictions caused by a drop in peak power due to cache program operations in a solid-state drive.

Low-Power Data Cache Architecture and Microarchitecture-level Management Policy for Multimedia Application (멀티미디어 응용을 위한 저전력 데이터 캐쉬 구조 및 마이크로 아키텍쳐 수준 관리기법)

  • Yang Hoon-Mo;Kim Cheong-Gil;Park Gi-Ho;Kim Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.13A no.3 s.100
    • /
    • pp.191-198
    • /
    • 2006
  • Today's portable electric consumer devices, which are operated by battery, tend to integrate more multimedia processing capabilities. In the multimedia processing devices, multimedia system-on-chips can handle specific algorithms which need intensive processing capabilities and significant power consumption. As a result, the power-efficiency of multimedia processing devices becomes important increasingly. In this paper, we propose a reconfigurable data caching architecture, in which data allocation is constrained by software support, and evaluate its performance and power efficiency. Comparing with conventional cache architectures, power consumption can be reduced significantly, while miss rate of the proposed architecture is very similar to that of the conventional caches. The reduction of power consumption for the reconfigurable data cache architecture shows 33.2%, 53.3%, and 70.4%, when compared with direct-mapped, 2-way, and 4-way caches respectively.

Fault Tolerant Cache for Soft Error (소프트에러 결함 허용 캐쉬)

  • Lee, Jong-Ho;Cho, Jun-Dong;Pyo, Jung-Yul;Park, Gi-Ho
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.1
    • /
    • pp.128-136
    • /
    • 2008
  • In this paper, we propose a new cache structure for effective error correction of soft error. We added check bit and SEEB(soft error evaluation block) to evaluate the status of cache line. The SEEB stores result of parity check into the two-bit shit register and set the check bit to '1' when parity check fails twice in the same cache line. In this case the line where parity check fails twice is treated as a vulnerable to soft error. When the data is filled into the cache, the new replacement algorithm is suggested that it can only use the valid block determined by SEEB. This structure prohibits the vulnerable line from being used and contributes to efficient use of cache by the reuse of line where parity check fails only once can be reused. We tried to minimize the side effect of the proposed cache and the experimental results, using SPEC2000 benchmark, showed 3% degradation in hit rate, 15% timing overhead because of parity logic and 2.7% area overhead. But it can be considered as trivial for SEEB because almost tolerant design inevitably adopt this parity method even if there are some overhead. And if only parity logic is used then it can have $5%{\sim}10%$ advantage than ECC logic. By using this proposed cache, the system will be protected from the threat of soft error in cache and the hit rate can be maintained to the level without soft error in the cache.

Hybrid Scheme of Data Cache Design for Reducing Energy Consumption in High Performance Embedded Processor (고성능 내장형 프로세서의 에너지 소비 감소를 위한 데이타 캐쉬 통합 설계 방법)

  • Shim, Sung-Hoon;Kim, Cheol-Hong;Jhang, Seong-Tae;Jhon, Chu-Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.3
    • /
    • pp.166-177
    • /
    • 2006
  • The cache size tends to grow in the embedded processor as technology scales to smaller transistors and lower supply voltages. However, larger cache size demands more energy. Accordingly, the ratio of the cache energy consumption to the total processor energy is growing. Many cache energy schemes have been proposed for reducing the cache energy consumption. However, these previous schemes are concerned with one side for reducing the cache energy consumption, dynamic cache energy only, or static cache energy only. In this paper, we propose a hybrid scheme for reducing dynamic and static cache energy, simultaneously. for this hybrid scheme, we adopt two existing techniques to reduce static cache energy consumption, drowsy cache technique, and to reduce dynamic cache energy consumption, way-prediction technique. Additionally, we propose a early wake-up technique based on program counter to reduce penalty caused by applying drowsy cache technique. We focus on level 1 data cache. The hybrid scheme can reduce static and dynamic cache energy consumption simultaneously, furthermore our early wake-up scheme can reduce extra program execution cycles caused by applying the hybrid scheme.

A new warp scheduling technique for improving the performance of GPUs by utilizing MSHR information (GPU 성능 향상을 위한 MSHR 정보 기반 워프 스케줄링 기법)

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.3
    • /
    • pp.72-83
    • /
    • 2017
  • GPUs can provide high throughput with latency hiding by executing many warps in parallel. MSHR(Miss Status Holding Registers) for L1 data cache tracks cache miss requests until required data is serviced from lower level memory. In recent GPUs, excessive requests for cache resources cause underutilization problem of GPU resources due to cache resource reservation fails. In this paper, we propose a new warp scheduling technique to reduce stall cycles under MSHR resource shortage. Cache miss rates for each warp is predicted based on the observation that each warp shows similar cache miss rates for long period. The warps showing low miss rates or computation-intensive warps are given high priority to be issued when MSHR is full status. Our proposal improves GPU performance by utilizing cache resource more efficiently based on cache miss rate prediction and monitoring the MSHR entries. According to our experimental results, reservation fail cycles can be reduced by 25.7% and IPC is increased by 6.2% with the proposed scheduling technique compared to loose round robin scheduler.