• Title/Summary/Keyword: Cache Simulator

Search Result 37, Processing Time 0.023 seconds

Cache Simulator Design for Optimizing Write Operations of Nonvolatile Memory Based Caches (비휘발성 메모리 기반 캐시의 쓰기 작업 최적화를 위한 캐시 시뮬레이터 설계)

  • Joo, Yongsoo;Kim, Myeung-Heo;Han, In-Kyu;Lim, Sung-Soo
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.2
    • /
    • pp.87-95
    • /
    • 2016
  • Nonvolatile memory (NVM) is being considered as an alternative of traditional memory devices such as SRAM and DRAM, which suffer from various limitations due to the technology scaling of modern integrated circuits. Although NVMs have advantages including nonvolatility, low leakage current, and high density, their inferior write performance in terms of energy and endurance becomes a major challenge to the successful design of NVM-based memory systems. In order to overcome the aforementioned drawback of the NVM, extensive research is required to develop energy- and endurance-aware optimization techniques for NVM-based memory systems. However, researchers have experienced difficulty in finding a suitable simulation tool to prototype and evaluate new NVM optimization schemes because existing simulation tools do not consider the feature of NVM devices. In this article, we introduce a NVM-based cache simulator to support rapid prototyping and evaluation of NVM-based caches, as well as energy- and endurance-aware NVM cache optimization schemes. We demonstrate that the proposed NVM cache simulator can easily prototype PRAM cache and PRAM+STT-RAM hybrid cache as well as evaluate various write traffic reduction schemes and wear leveling schemes.

BLOCS: Block Correlation Aware Sequential Pattern Mining based Caching Algorithm for Hybrid Storages (BLOCS: 블록 상관관계를 인지하는 시퀀스 패턴 마이닝 기반 하이브리드 스토리지 캐슁 알고리즘)

  • Lee, Seongjin;Won, Youjip
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.7
    • /
    • pp.113-130
    • /
    • 2014
  • In this paper, we propose BLOCS algorithm to find sequence of data that should be saved in cache device of hybrid storage system which uses SSD as a cache device. BLOCS algorithm which uses a sequence pattern mining scheme, creates a set of frequently requested sectors with respect to requested order of sectors. To compare the performance of the proposed scheme, we introduce Distance (DIST) based scheme, Request Frequency (FREQ) based scheme, and Frequency times Size (F-S) based scheme. We measure the hit ratio and I/O latency of different caching schemes using hybrid storage caching simulator. We acquired booting workload along with ten scenarios of launching applications and use the workloads as input to the cache simulator. After experiment with booting workload, we find that BLOCS scheme gives hit ratio of 61% which is about 15% higher than the least performing DIST scheme.

A New Hybrid Architecture for Cooperative Web Caching

  • Baek, Jin-Suk;Kaur, Gurpreet;Yang, Jung-Hoon
    • Journal of Ubiquitous Convergence Technology
    • /
    • v.2 no.1
    • /
    • pp.1-11
    • /
    • 2008
  • An effective solution to the problems caused by the explosive growth of World Wide Web is a web caching that employing an additional server, called proxy cache, between the clients and main server for caching the popular web objects near the clients. However, a single proxy cache can easily become the bottleneck. Deploying groups of cooperative caches provides scalability and robustness by eliminating the limitations caused by a single proxy cache. Two common architectures to implement the cooperative caching are hierarchical and distributed caching systems. Unfortunately, both architectures suffer from performance limitations. We propose an efficient hybrid caching architecture eliminating these limitations by using both the hierarchical and same level caches. Our performance evaluation with our investigated simulator shows that the proposed architecture offers the best of both existing architectures in terms of cache hit rate, the number of query messages from clients, and response time.

  • PDF

An L1 Cache Prefetching Scheme using Excessively Aggressive Prefetchering and a Small Direct-mapped Filtering Cache (공격적인 선인출 및 직접 사상 필터링을 이용한 L1 캐시 선인출 기법)

  • Chon, Young-Suk
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.11
    • /
    • pp.836-852
    • /
    • 2006
  • This paper proposes an L1 cache prefetch scheme using an excessively aggressive hardware prefetcher and a hardware prefetch filter having a small direct-mapped filtering cache. A quantitative analysis method has been introduced and applied to analyze nonideal effects of aggressive cache prefetching. From those analysis results, the structure and algorithm of a prefetch filter has been derived and simulated, and the overall system performance has been measured using a cycle-by-cycle cache simulator. Experimental results show that the proposed scheme improves the overall system performance by 18% on the average over several benchmarks

The Effects of Cache Memory on the System Bus Traffic (캐쉬 메모리가 버스 트래픽에 끼치는 영향)

  • 조용훈;김정선
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.1
    • /
    • pp.224-240
    • /
    • 1996
  • It is common sense for at least one or more levels of cache memory to be used in these day's computer systems. In this paper, the impact of the internal cache memory organization on the performance of the computer is investigated by using a simulator program, which is wirtten by authors and run on SUN SPARC workstation, with several real execution, with several real execution trace files. 280 cache organizations have been simulated using n-way set associative mapping and LRU(Least Recently Used) replacement algorithm with write allocation policy. As a result, 16-way setassociative cache is the best configuration, and when we select 256KB cache memory and 64 byte line size, the bus traffic ratio was decreased compared to that of the noncache system so that a single bus could support almost 7 processors without any delay and degradationof high ratio(hit ratio was 99.21%). The smaller the line size we choose, the little lower hit ratio we can get, but the more processors can be supported by a single bus(maximum 18 processors). Therefore, using a proper cache memory organization can make a single bus structure be able to support multiple processors without any performance degradation.

  • PDF

Performance Analysis of Parity Cache enabled RAID Level 5 for DDR Memory Storage Device (패리티 캐시를 이용한 DDR 메모리 저장 장치용 RAID 레벨 5의 성능 분석)

  • Gu, Bon-Gen;Kwak, Yun-Sik;Cheong, Seung-Kook;Hwang, Jung-Yeon
    • Journal of Advanced Navigation Technology
    • /
    • v.14 no.6
    • /
    • pp.916-927
    • /
    • 2010
  • In this paper, we analyze the performance of the parity cache enabled RAID level-5 via the simulation. This RAID system consists of the DDR memory-based storage devices. To do this, we develop the simulation model and suggest the basic performance analysis data which we want to get via the simulation. And we implement the simulator based on the simulation model and execute the simulator. From the result of the simulation, we expect that the parity cache enabled RAID level-5 configured by the DDR memory based storage devices has the positive effectiveness to the enhancing of the storage system performance if the storage access patterns of applications are tuned.

Filter Cache Predictor Using Mode Selection Bit (모드 선택 비트를 사용한 필터 캐시 예측기)

  • Kwak, Jong-Wook
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.5
    • /
    • pp.1-13
    • /
    • 2009
  • Filter cache has been introduced as one solution of reducing cache power consumption. More than 50% of the power reduction results from the filter cache, whereas more than 20% of the performance is compromised. To minimize the performance degradation of the filter cache, the predictive filter cache has been proposed. In this paper, we review the previous filter cache predictors and analyze the problems of the solutions. As a result, we found main problems that cause prediction misses in previous filter cache schemes and, to resolve the problems, this paper proposes a new prediction policy. In our scheme, some reference bit entries, called MSBs, are inserted into filter cache and BTB, to adaptively control the filter cache access. In simulation parts, we use a modified SimpleScalar simulator with MiBench benchmark programs to verify the proposed filter cache. The simulation result shows in average 5% performance improvement, compared to previous ones.

Energy-aware Instruction Cache Design using Partitioning (분할 기법을 이용한 저전력 명령어 캐쉬 설계)

  • Kim, Jong-Myon;Jung, Jae-Wook;Kim, Cheol-Hong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.5
    • /
    • pp.241-251
    • /
    • 2007
  • Energy consumption in the instruction cacheaccounts for a significant portion of the total processor energy consumption. Therefore, reducing energy consumption in the instruction cache is important in designing embedded processors. This paper proposes a method for reducing dynamic energy consumption in the instruction cache by partitioning it to smaller (less energy-consuming) sub-caches. When a request comes into the proposed cache, only one sub-cache is accessed by utilizing the locality of applications. By contrast, the other sub-caches are not accessed, leading todynamic energy reduction. In addition, the proposed cache reduces dynamic energy consumption by eliminating the energy consumed in tag matching. We evaluated the energy efficiency by running cycle accurate simulator, SimpleScalar. with power parameters obtained from CACTI. Simulation results show that the proposed cache reduces dynamic energy consumption by $37%{\sim}60%$ compared to the traditional direct-mapped instruction cache.

Chacterization of Small Embedded Programs (소형 임베디드 프로그램의 실행 속도와 특성분석)

  • Chung, Sae-Am;Yi, Jong-Su;Kim, Jun-Seong
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.771-772
    • /
    • 2008
  • In this paper, we analyze the characterization of Mibench, an embedded system benchmark program, using simplescalar simulator. The experimental results show Mibench generally is formed by lots of integer and memory access instructions. Especially, IPC of rijndael decoding is effected by cache size largely, but IPC of CRC32 is few effected by cache size or branch predicting algorithm.

  • PDF

Performance Evaluation and Analysis of Symmetric Multiprocessor using Multi-Program Benchmarks (Multi-Program 벤치마크를 이용한 대칭구조 Multiprocessor의 성능평가와 분석)

  • Jeong Tai-Kyeong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.4
    • /
    • pp.645-651
    • /
    • 2006
  • This paper discusses computer system performance evaluation and analysis by employing a simulator which able to execute a symmetric multiprocessor in machine simulation environment. We also perform a multiprocessor system analysis using SPLASH-2, which is a suite of multi-program benchmarks for multiprocessors, to perform the behavior study of the symmetric multiprocessor OS kernel, IRIX5.3. To validate the scalability of symmetric multiprocessor system, we demonstrate structure and evaluation methods for symmetric multiprocessor as well as a functionality-based software simulator, SimOS. In this paper, we examine cache miss count and stall time on the symmetric multiprocessor between the local instruction and local data, using the multi-program benchmarks such as RADIX sorting algorithm and Cholesky factorization.