• Title/Summary/Keyword: cache performance

Search Result 659, Processing Time 0.056 seconds

Design of an Asynchronous Instruction Cache based on a Mixed Delay Model (혼합 지연 모델에 기반한 비동기 명령어 캐시 설계)

  • Jeon, Kwang-Bae;Kim, Seok-Man;Lee, Je-Hoon;Oh, Myeong-Hoon;Cho, Kyoung-Rok
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.3
    • /
    • pp.64-71
    • /
    • 2010
  • Recently, to achieve high performance of the processor, the cache is splits physically into two parts, one for instruction and one for data. This paper proposes an architecture of asynchronous instruction cache based on mixed-delay model that are DI(delay-insensitive) model for cache hit and Bundled delay model for cache miss. We synthesized the instruction cache at gate-level and constructed a test platform with 32-bit embedded processor EISC to evaluate performance. The cache communicates with the main memory and CPU using 4-phase hand-shake protocol. It has a 8-KB, 4-way set associative memory that employs Pseudo-LRU replacement algorithm. As the results, the designed cache shows 99% cache hit ratio and reduced latency to 68% tested on the platform with MI bench mark programs.

Keeping-ownership Cache Replacement Policies for Remote Access Caches of NUMA System (NUMA 시스템에서 소유권에 근거한 원격 캐시 교체 정책)

  • 신숭현;곽종욱;장성태;전주식
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.8
    • /
    • pp.473-486
    • /
    • 2004
  • NUMA systems have remote access caches(RAC) in each local node to reduce the overhead for repeated remote memory accesses. By this RAC, memory latency and network traffic can be reduced and the performance of the multiprocessor system can be improved. Until now, several cache replacement policies have been proposed in recent years, and there also is cache replacement policy for multiprocessor systems. In this paper, we propose a cache replacement policy which is based on cache line coherence information. In this policy, the cache line that does not have an ownership is replaced first with respect to cache line that has an ownership. Like this way, the overhead to transfer ownership is avoided and the memory latency can be decreased. We also propose “Keeping-Ownership replacement policy with MRU (KOM)” and “Keeping-Ownership replacement policy with Reference Bit(KORB)” to reduce the frequent replacement penalty of the ownership-lacking cache line. We compare and analyze these with LRU and Pseudo LRU(PLRU). The simulation shows that KOM outperforms the PLRU by 25%, and KORB outperforms the PLRU by 13%. Although the hardware cost of KOM is very small, the performance of KOM is nearly equal to that of the LRU.

Advanced Victim Cache with Processor Reuse Information (프로세서의 재사용 정보를 이용하는 개선된 고성능 희생 캐쉬)

  • Kwak Jong Wook;Lee Hyunbae;Jhang Seong Tae;Jhon Chu Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.12
    • /
    • pp.704-715
    • /
    • 2004
  • Recently, a single or multi processor system uses the hierarchical memory structure to reduce the time gap between processor clock rate and memory access time. A cache memory system includes especially two or three levels of caches to reduce this time gap. Moreover, one of the most important things In the hierarchical memory system is the hit rate in level 1 cache, because level 1 cache interfaces directly with the processor. Therefore, the high hit rate in level 1 cache is critical for system performance. A victim cache, another high level cache, is also important to assist level 1 cache by reducing the conflict miss in high level cache. In this paper, we propose the advanced high level cache management scheme based on the processor reuse information. This technique is a kind of cache replacement policy which uses the frequency of processor's memory accesses and makes the higher frequency address of the cache location reside longer in cache than the lower one. With this scheme, we simulate our policy using Augmint, the event-driven simulator, and analyze the simulation results. The simulation results show that the modified processor reuse information scheme(LIVMR) outperforms the level 1 with the simple victim cache(LIV), 6.7% in maximum and 0.5% in average, and performance benefits become larger as the number of processors increases.

Using Outermost-Zone Tracks as a Cache to Boost Disk Write Performance (디스크 쓰기 성능 향상을 위한 가장자리 영역 트랙의 이용)

  • U, Jong-Jeong;Hong, Chun-Pyo
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.11
    • /
    • pp.3116-3123
    • /
    • 1999
  • Current disk systems are generally designed to reduce read traffic more effectively. Hence, write traffic of the I/O workload could potentially become a bottleneck of the disk system performance. In order to overcome this problem without much cost, this paper presents using outermost-zone track of multi-zoned recording disk as a secondary disk cache. The proposed disk cache improves the disk system performance by following exploitations: speed difference between block transfer and track transfer, difference in transfer rate between outermost-zone tracks and inner tracks, reduction in the seek time caused by decreasing the number of disk cache tracks, and idle period during burst accesses. In addition, it does not waste the disk space because it allocates the caching space by the cylinder unit. The simulation results show that the proposed system improves 2.54∼3.11 times better in terms of average response time for write operations than existing disk systems..

  • PDF

Warp-Based Load/Store Reordering to Improve GPU Time Predictability

  • Huangfu, Yijie;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.2
    • /
    • pp.58-68
    • /
    • 2017
  • While graphics processing units (GPUs) can be used to improve the performance of real-time embedded applications that require high throughput, it is challenging to estimate the worst-case execution time (WCET) of GPU programs, because modern GPUs are designed for improving the average-case performance rather than time predictability. In this paper, a reordering framework is proposed to regulate the access to the GPU data cache, which helps to improve the accuracy of the estimation of GPU L1 data cache miss rate with low performance overhead. Also, with the improved cache miss rate estimation, tighter WCET estimations can be achieved for GPU programs.

A Web Caching Algorithm with Divided Cache Scope (분할 영역 기반의 웹 캐싱 알고리즘)

  • Ko, Il-Suk;Na, Yun-Ji;Leem, Chun-Seong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.11b
    • /
    • pp.1007-1010
    • /
    • 2003
  • Efficient using of web cache is becoming important factor that decide system management efficiency in web-Based system. Cache performance depends heavily on replacement algorithms, which dynamically select a suitable subset of objects for caching in a finite space. However, replacement algorithm on web cache has many differences than traditional replacement algorithm. In this paper, a web-caching algorithm is proposed for efficient operation of web base system. The algorithm is designed based on a divided scope that considered size reference characteristic and heterogeneity on web object. The performance of the algorithm is analyzed with an experiment. With the experiment results, the algorithm is compared with previous replacement algorithms, and its performance is confirmed ith an improvement of response speed.

  • PDF

High-Performance FFT Using Data Reorganization (데이터 재구성 기법을 이용한 고성능 FFT)

  • Park Neungsoo;Choi Yungho
    • The KIPS Transactions:PartA
    • /
    • v.12A no.3 s.93
    • /
    • pp.215-222
    • /
    • 2005
  • The efficient utilization of cache memories is a key factor in achieving high performance for computing large signal transforms. Nonunit stride access in computation of large DFTs causes cache conflict misses, thereby resulting in poor cache performance. It leads to a severe degradation in overall performance. In this paper, we propose a dynamic data layout approach considering the memory hierarchy system. In our approach, data reorganization is performed between computation stages to reduce the number of cache misses. Also, we develop an efficient search algorithm to determine the optimal tree with the minimum execution time among possible factorization trees considering the size of DFTs and the data access stride. Our approach is applied to compute the fast Fourier Transform (FFT). Experiments were performed on Pentium 4, $Athlon^{TM}$ 64, Alpha 21264, UtraSPARC III. Experiment results show that our FFT achieve performance improvement of up to 3.37 times better than the previous FFT packages.

Core-aware Cache Replacement Policy for Reconfigurable Last Level Cache (재구성 가능한 라스트 레벨 캐쉬 구조를 위한 코어 인지 캐쉬 교체 기법)

  • Son, Dong-Oh;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.11
    • /
    • pp.1-12
    • /
    • 2013
  • In multi-core processors, Last Level Cache(LLC) can reduce the speed gap between the memory and the core. For this reason, LLC has big impact on the performance of processors. LLC is composed of shared cache and private cache. In computer architecture community, most researchers have mainly focused on the management techniques for shared cache, while management techniques for private cache have not been widely researched. In conventional private LLC, memory is statically assigned to each core, resulting in serious performance degradation when the workloads are not fairly distributed. To overcome this problem, this paper proposes the replacement policy for managing private cache of LLC efficiently. As proposed core-aware cache replacement policy can reconfigure LLC dynamically, hit rate of LLC is increases drastically. Moreover, proposed policy uses 2-bit saturating counters to improve the performance. According to our simulation results, the proposed method can improve hit rates by 9.23% and reduce the access time by 12.85% compared to the conventional method.

A Review of Web Cache Prefetching

  • Deng, YuFeng;Manoharan, Sathiamoorthy
    • Journal of information and communication convergence engineering
    • /
    • v.12 no.3
    • /
    • pp.161-167
    • /
    • 2014
  • Web caches help to reduce latencies arising from slow networks through storing and reusing what was used before. Repeat access to a cached resource does not incur network latencies. However, resources that have never been used will not be found in the cache. Cache prefetching is a technique that helps to fill a cache with still-unused resources in anticipation that these resources will be used in the near future. Typically these unused resources are related to the resources that have been accessed in the recent past. While web caching exploits temporal locality, prefetching attempts to exploit spatial locality. Access to the prefetched resources will be cache hits, and therefore reduces the latency as perceived by the user. This paper reviews the cache infrastructure supported by the hypertext transfer protocol and discusses web cache prefetching in general, including Mozilla's prefetching infrastructure. It then classifies and reviews some prefetching techniques.

Meta Data Caching Mechanism in Distributed Directory Database Systems (분산 디렉토리 데이터베이스 시스템에서의 메타 데이터 캐싱 기법)

  • Lee, Kang-Woo;Koh, Jin-Gwang
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.6
    • /
    • pp.1746-1752
    • /
    • 2000
  • In this paper, a cache mechanism is proposed to improve the speed of query processing in distributed director database systems. To decrease search time of requested objects and query processing time. query requests and results about objects in a remote site are stored in the cache of a local site. Cache system architecture is designed according to the classified information. Cache schema are designed for each cache information. Operational algorithms are developed for meta data cache which has meta data tree. This tree improves the speed of query processing by reducing the scope of search space. Finally, performance evaluation is performed by comparing the proposed cache mechanism with X500.

  • PDF