• Title/Summary/Keyword: cache miss

Search Result 99, Processing Time 0.029 seconds

An Extended R-Tree Indexing Method using Prefetching in Main Memory (메인 메모리에서 선반입을 사용한 확장된 R-Tree 색인 기법)

  • Kang, Hong-Koo;Kim, Dong-O;Hong, Dong-Sook;Han, Ki-Joon
    • Journal of Korea Spatial Information System Society
    • /
    • v.6 no.1 s.11
    • /
    • pp.19-29
    • /
    • 2004
  • Recently, studies have been performed to improve the cache performance of the R-Tree in main memory. A general mothed to improve the cache performance of the R-Tree is to reduce size of an entry so that a node can store more entries and fanout of it can increase. However, this method generally requites additional process to reduce information of entries and do not support incremental updates. In addition, the cache miss always occurs on moving between a parent node and a child node. To solve these problems efficiently, this paper proposes and evaluates the PR-Tree that is an extended R-Tree indexing method using prefetching in main memory. The PR-Tree can produce a wider node to optimize prefetching without additional modifications on the R-Tree. Moreover, the PR-Tree reduces cache miss rates that occur in moving between a parent node and a child node. In our simulation, the search performance, the update performance, and the node split performance of the PR-Tree improve up to 38%. 30%, and 67% respectively, compared with the original R-Tree.

  • PDF

An Active Prefetch Filtering Schemes using Exclusive Prefetch Cache (선인출 전용 캐시를 이용한 적극적 선인출 필터링 기법)

  • Chon Young-Suk;Kim Suk-il;Jeon Joong-nam
    • The KIPS Transactions:PartA
    • /
    • v.12A no.1 s.91
    • /
    • pp.41-52
    • /
    • 2005
  • Memory reference instruction caused by cache miss is the critical factor that limits the processing power of processor. Cache prefetching technique is an effective way to reduce the latency due to memory access. However, excessively aggressive prefetch leads to cache pollution and finally to cancel out the advantage of prefetch. In this study, an active prefetch filtering scheme is introduced which dynamically decides whether to commence prefetching after referring a filtering table to reduce the cache pollution due to unnecessary prefetches. For the precision filtering, an evicted address referencing scheme has been proposed where the filter directly compares the current prefetch address with previous unnecessary prefetch addresses stored in filtering table. Moreover, a small sized exclusive prefetch cache has been introduced to increase the amount of eviction of unnecessarily prefetched addresses to enhance the accuracy of dynamic filtering. The exclusive prefetch cache also prevents useful demand data from being pushed out by prefetched data, while the evicted address direct referencing scheme enables the prefetch cache to keep most of useful prefetch data within its small size. Experimental results from commonly used general and multimedia benchmarks show that the average cache miss ratio has been decreased by $13.3{\%}$ by virtue of enhanced filtering accuracy compared with conventional schemes.

A New Cache Replacement Policy for Improving Last Level Cache Performance (라스트 레벨 캐쉬 성능 향상을 위한 캐쉬 교체 기법 연구)

  • Do, Cong Thuan;Son, Dong Oh;Kim, Jong Myon;Kim, Cheol Hong
    • Journal of KIISE
    • /
    • v.41 no.11
    • /
    • pp.871-877
    • /
    • 2014
  • Cache replacement algorithms have been developed in order to reduce miss counts. In modern processors, the performance gap between the processor and main memory has been increasing, creating a more important role for cache replacement policies. The Least Recently Used (LRU) policy is one of the most common policies used in modern processors. However, recent research has shown that the performance gap between the LRU and the theoretical optimal replacement algorithm (OPT) is large. Although LRU replacement has been proven to be adequate over and over again, the OPT/LRU performance gap is continuously widening as the cache associativity becomes large. In this study, we observed that there is a potential chance to improve cache performance based on existing LRU mechanisms. We propose a method that enhances the performance of the LRU replacement algorithm based on the access proportion among the lines in a cache set during a period of two successive replacement actions that make the final replacement action. Our experimental results reveals that the proposed method reduced the average miss rate of the baseline 512KB L2 cache by 15 percent when compared to conventional LRU. In addition, the performance of the processor that applied our proposed cache replacement policy improved by 4.7 percent over LRU, on average.

Performance Improvement and Power Consumption Reduction of an Embedded RISC Core

  • Jung, Hong-Kyun;Jin, Xianzhe;Ryoo, Kwang-Ki
    • Journal of information and communication convergence engineering
    • /
    • v.10 no.1
    • /
    • pp.78-84
    • /
    • 2012
  • This paper presents a branch prediction algorithm and a 4-way set-associative cache for performance improvement of an embedded RISC core and a clock-gating algorithm with observability don’t care (ODC) operation to reduce the power consumption of the core. The branch prediction algorithm has a structure using a branch target buffer (BTB) and 4-way set associative cache that has a lower miss rate than a direct-mapped cache. Pseudo-least recently used (LRU) policy is used for reducing the number of LRU bits. The clock-gating algorithm reduces dynamic power consumption. As a result of estimation of the performance and the dynamic power, the performance of the OpenRISC core applied to the proposed architecture is improved about 29% and the dynamic power of the core with the Chartered 0.18 ${\mu}m$ technology library is reduced by 16%.

WARP: Memory Subsystem Effective for Wrapping Bursts of a Cache

  • Jang, Wooyoung
    • ETRI Journal
    • /
    • v.39 no.3
    • /
    • pp.428-436
    • /
    • 2017
  • State-of-the-art processors require increasingly complicated memory services for high performance and low power consumption. In particular, they request transfers within a burst in a wrap-around order to minimize the miss penalty of a cache. However, synchronous dynamic random access memories (SDRAMs) do not always generate transfers in the wrap-round order required by the processors. Thus, a memory subsystem rearranges the SDRAM transfers in the wrap-around order, but the rearrangement process may increase memory latency and waste the bandwidth of on-chip interconnects. In this paper, we present a memory subsystem that is effective for the wrapping bursts of a cache. The proposed memory subsystem makes SDRAMs generate transfers in an intermediate order, where the transfers are rearranged in the wrap-around order with minimal penalties. Then, the transfers are delivered with priority, depending on the program locality in space. Experimental results showed that the proposed memory subsystem minimizes the memory performance loss resulting from wrapping bursts and, thus, improves program execution time.

MSHR-Aware Dynamic Warp Scheduler for High Performance GPUs (GPU 성능 향상을 위한 MSHR 활용률 기반 동적 워프 스케줄러)

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.5
    • /
    • pp.111-118
    • /
    • 2019
  • Recent graphic processing units (GPUs) provide high throughput by using powerful hardware resources. However, massive memory accesses cause GPU performance degradation due to cache inefficiency. Therefore, the performance of GPU can be improved by reducing thread parallelism when cache suffers memory contention. In this paper, we propose a dynamic warp scheduler which controls thread parallelism according to degree of cache contention. Usually, the greedy then oldest (GTO) policy for issuing warp shows lower parallelism than loose round robin (LRR) policy. Therefore, the proposed warp scheduler employs the LRR warp scheduling policy when Miss Status Holding Register(MSHR) utilization is low. On the other hand, the GTO policy is employed in order to reduce thread parallelism when MSHRs utilization is high. Our proposed technique shows better performance compared with LRR and GTO policy since it selects efficient scheduling policy dynamically. According to our experimental results, our proposed technique provides IPC improvement by 12.8% and 3.5% over LRR and GTO on average, respectively.

Research on Web Cache Infection Methods and Countermeasures (웹 캐시 감염 방법 및 대응책 연구)

  • Hong, Sunghyuck;Han, Kun-Hee
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.2
    • /
    • pp.17-22
    • /
    • 2019
  • Cache is a technique that improves the client's response time, thereby reducing the bandwidth and showing an effective side. However, there are vulnerabilities in the cache technique as well as in some techniques. Web caching is convenient, but it can be exploited by hacking and cause problems. Web cache problems are mainly caused by cache misses and excessive cache line fetch. If the cache miss is high and excessive, the cache will become a vulnerability, causing errors such as transforming the secure data and causing problems for both the client and the system of the user. If the user is aware of the cache infection and the countermeasure against the error, the user will no longer feel the cache error or the problem of the infection occurrence. Therefore, this study proposed countermeasures against four kinds of cache infections and errors, and suggested countermeasures against web cache infections.

A Cache Consistency Algorithm for Client Caching Data Management Systems (클라이언트 캐슁 데이터 관리 시스템을 위한 캐쉬 일관성 알고리즘)

  • Kim Chi-Yeon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2006.05a
    • /
    • pp.1043-1046
    • /
    • 2006
  • Cached data management of clients is required to guarantee the correctness of client's applications. There are two categories of cache consistency algorithms : detection-based and avoidance-based cache consistency algorithm. Detection?.based schemes allow stale data access and then check the validity of any cached data before they ran be allowed to commit. In contrast, under avoidance-based algorithms, transactions never have the opportunity to access stale data. In this paper, we propose a new avoidance-based cache consistency algorithm make use of version. The proposed method maintains the two versions at clients and servers, so it has no callback message and it can be reduced abort ratio of transactions compare with the single-versioned algorithms. In addition to, the proposed method can be decreased cache miss using by mixed invalidation and propagation for remote update action.

  • PDF

Cache Sensitive T-tree Main Memory Index for Range Query Search (범위질의 검색을 위한 캐시적응 T-트리 주기억장치 색인구조)

  • Choi, Sang-Jun;Lee, Jong-Hak
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.10
    • /
    • pp.1374-1385
    • /
    • 2009
  • Recently, advances in speed of the CPU have for out-paced advances in memory speed. Main-memory access is increasingly a performance bottleneck for main-memory database systems. To reduce memory access speed, cache memory have incorporated in the memory subsystem. However cache memories can reduce the memory speed only when the requested data is found in the cache. We propose a new cache sensitive T-tree index structure called as $CST^*$-tree for range query search. The $CST^*$-tree reduces the number of cache miss occurrences by loading the reduced internal nodes that do not have index entries. And it supports the sequential access of index entries for range query by connecting adjacent terminal nodes and internal index nodes. For performance evaluation, we have developed a cost model, and compared our $CST^*$-tree with existing CST-tree, that is the conventional cache sensitive T-tree, and $T^*$-tree, that is conventional the range query search T -tree, by using the cost model. The results indicate that cache miss occurrence of $CST^*$-tree is decreased by 20~30% over that of CST-tree in a single value search, and it is decreased by 10~20% over that of $T^*$-tree in a range query search.

  • PDF

Determination of a Grain Size for Reducing Cache Miss Rate of Direct-Mapped Caches (직접 사상 캐쉬의 캐쉬 실패율을 감소시키기 위한 성김도 정책)

  • Jung, In-Bum;Kong, Ki-Sok;Lee, Joon-Won
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.7
    • /
    • pp.665-674
    • /
    • 2000
  • In data parallel programs incurring high cache locality, the choice of grain sizes affects cache performance. Though the grain sizes chosen provide fair load balance among processors, the grain sizes that ignore underlying caching effect result in address interferences between grains allocated to a processor. These address interferences appear to have a negative impact on the cache locality, since they result in cache conflict misses. To address this problem, we propose a best grain size driven from a cache size and the number of processors based on direct mapped cache's characteristic. Since the proposed method does not map the grains to the same location in the cache, cache conflict misses are reduced. Simulation results show that the proposed best grain size substantially improves the performance of tested data parallel programs through the reduction of cache misses on direct-mapped caches.

  • PDF