• Title/Summary/Keyword: Level 1 Cache

Search Result 60, Processing Time 0.029 seconds

Improving Performance of Internet by Using Hierarchical Proxy Cache (계층적 프록시 캐쉬를 이용한 인터넷 성능 향상 기법)

  • 이효일;김종현
    • Journal of the Korea Society for Simulation
    • /
    • v.9 no.2
    • /
    • pp.1-14
    • /
    • 2000
  • Recently, as construction of information infra including high-speed communication networks remarkably expands, more various information services have been provided. Thus the number of internet users rapidly increases, and it results in heavy load on Web server and higher traffics on networks. The phenomena cause longer response time that means worse quality of service. To solve such problems, much effort has been attempted to loosen bottleneck on Web server, reduce traffic on networks and shorten response times by caching informations being accessed more frequently at the proxy server that is located near to clients. And it is also possible to improve internet performance further by allowing clients to share informations stored in proxy caches. In this paper, we perform simulations of hierarchical proxy caches with the 3-level 4-ary tree structure by using real web traces, and analyze cache hit ratio for various cache replacement policies and cache sizes when the delayed-store scheme is applied. According to simulation results, the delayed-store scheme increases the remote cache hit ratio, that improves quality of service by shortening the service response time.

  • PDF

An Interference Matrix Based Approach to Bounding Worst-Case Inter-Thread Cache Interferences and WCET for Multi-Core Processors

  • Yan, Jun;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.131-140
    • /
    • 2011
  • Different cores typically share the last-level cache in a multi-core processor. Threads running on different cores may interfere with each other. Therefore, the multi-core worst-case execution time (WCET) analyzer must be able to safely and accurately estimate the worst-case inter-thread cache interference. This is not supported by current WCET analysis techniques that manly focus on single thread analysis. This paper presents a novel approach to analyze the worst-case cache interference and bounding the WCET for threads running on multi-core processors with shared L2 instruction caches. We propose to use an interference matrix to model inter-thread interference, on which basis we can calculate the worst-case inter-thread cache interference. Our experiments indicate that the proposed approach can give a worst-case bound less than 1%, as in benchmark fib-call, and an average 16.4% overestimate for threads running on a dual-core processor with shared-L2 cache. Our approach dramatically improves the accuracy of WCET overestimatation by on average 20.0% compared to work.

An Efficient Caching Algorithm to Minimize Duplicated Disk Blocks in 2-level Disk Cache System (2-레벨 디스크 캐쉬 시스템에서 디스크 블록 중복 저장을 최소화하는 효율적인 캐싱 알고리즘)

  • 류갑상;정수목
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.1
    • /
    • pp.57-64
    • /
    • 2004
  • The speed gap between processors and disks is a serious problem. So, I/O sub-system limits the performance of computer system. To overcome the speed gap, caches have been used in computer system. By using cache, the access times to disk blocks can be reduced and the performance of computer system can be improved. In this paper, we proposed an efficient cache management algorithm for computer system which have buffer cache and disk cache. The proposed algorithm can minimize the duplicated blocks between buffet cache and disk cache. We evaluate the proposed algorithm by trace-driven simulation. The simulation results show that the proposed algorithm can reduce the mean access time to disk blocks.

  • PDF

Hybrid Scheme of Data Cache Design for Reducing Energy Consumption in High Performance Embedded Processor (고성능 내장형 프로세서의 에너지 소비 감소를 위한 데이타 캐쉬 통합 설계 방법)

  • Shim, Sung-Hoon;Kim, Cheol-Hong;Jhang, Seong-Tae;Jhon, Chu-Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.3
    • /
    • pp.166-177
    • /
    • 2006
  • The cache size tends to grow in the embedded processor as technology scales to smaller transistors and lower supply voltages. However, larger cache size demands more energy. Accordingly, the ratio of the cache energy consumption to the total processor energy is growing. Many cache energy schemes have been proposed for reducing the cache energy consumption. However, these previous schemes are concerned with one side for reducing the cache energy consumption, dynamic cache energy only, or static cache energy only. In this paper, we propose a hybrid scheme for reducing dynamic and static cache energy, simultaneously. for this hybrid scheme, we adopt two existing techniques to reduce static cache energy consumption, drowsy cache technique, and to reduce dynamic cache energy consumption, way-prediction technique. Additionally, we propose a early wake-up technique based on program counter to reduce penalty caused by applying drowsy cache technique. We focus on level 1 data cache. The hybrid scheme can reduce static and dynamic cache energy consumption simultaneously, furthermore our early wake-up scheme can reduce extra program execution cycles caused by applying the hybrid scheme.

A User-Level File System for Streaming Media Caching (스트리밍 미디어 캐슁을 위한 사용자 수준 화일 시스템)

  • Oh, Jae-Hak;Cha, Ho-Jeong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.8
    • /
    • pp.472-483
    • /
    • 2002
  • This paper presents the design and implementation of a cache file system, umcFS, which is specifically designed to provide an efficient caching and transmission of streaming media. The proposed file system is based on the concept of file disk and implemented as an application level on top of a general-purpose file system. The file disk favors the continuity of cached media and provides an efficient I/O mechanism for cache server. umcFS statically allocates control blocks as well as media cache blocks. These blocks are referenced by the single-level indirect management structure. As the file system is designed as an application level, it is easy to develop and port to other systems. The performance of the implemented system shows that umcFS performs about 13% better than the native file system in randomly accessing the cache blocks of 1024KB.

Radiation-Induced Soft Error Detection Method for High Speed SRAM Instruction Cache (고속 정적 RAM 명령어 캐시를 위한 방사선 소프트오류 검출 기법)

  • Kwon, Soon-Gyu;Choi, Hyun-Suk;Park, Jong-Kang;Kim, Jong-Tae
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.6B
    • /
    • pp.948-953
    • /
    • 2010
  • In this paper, we propose multi-bit soft error detection method which can use an instruction cache of superscalar CPU architecture. Proposed method is applied to high-speed static RAM for instruction cache. Using 1D parity and interleaving, it has less memory overhead and detects more multi-bit errors comparing with other methods. It only detects occurrence of soft errors in static RAM. Error correction is treated like a cache miss situation. When soft errors are occurred, it is detected by 1D parity. Instruction cache just fetch the words from lower-level memory to correct errors. This method can detect multi-bit errors in maximum 4$\times$4 window.

Analysis and Advice on Cache Algorithms of SSD FTL (SSD FTL 캐시 알고리즘 분석 및 제언)

  • Hyung Bong, Lee;Tae Yun, Chung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.1
    • /
    • pp.1-8
    • /
    • 2023
  • It is impossible to overwrite on an already allocated page in SSDs, so whenever a write operation occurs a page replacement with a clean page is required. To resolve this problem, SSDs have an internal flash translation layer called FTL that maps logical pages managed by a file system of operating system to currently allocated physical pages. SSD pages discarded due to write operations must be recycled through initialization, but since the number of initialization times is limited the FTL provides a caching function to reduce the number of writes in addition to the page mapping function, which is a core function. In this study, we focus on the FTL cache methodologies reducing the number of page writes and analyze the related algorithms, and propose a write-only cache strategy. As a result of experimenting with the write-only cache using a simulator, it showed an improvement of up to 29%.

An Energy-Delay Efficient System with Adaptive Victim Caches (선택적 희생 캐쉬를 이용한 저전력 고성능 시스템 설계 방안)

  • Kim Cheol Hong;Shim Sunghoon;Jhon Chu Shik;Jhang Seong Tae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.663-674
    • /
    • 2005
  • We propose a system aimed at achieving high energy-delay efficiency by using adaptive victim caches. Particularly, we investigate methods to improve the hit rates in the first level of memory hierarchy, which reduces the number of accesses to mort power consuming memory structures such as L2 cache. Victim cache is a memory element for reducing conflict misses in a direct-mapped L1 cache. We present two techniques to fill the victim cache with the blocks that have higher probability to be re-reqeusted by processor. Hit-based victim cache ks tilled with the blocks which were referenced frequently by processor. Replacement-based victim cache is filled with the blocks which were evicted from the sets where block replacements had happened frequently According to our simulations, replacement-based victim cache scheme outperforms the conventional victim cache scheme about $2\%$ on average and refutes the power consumption by up to $8\%$.

A new warp scheduling technique for improving the performance of GPUs by utilizing MSHR information (GPU 성능 향상을 위한 MSHR 정보 기반 워프 스케줄링 기법)

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.3
    • /
    • pp.72-83
    • /
    • 2017
  • GPUs can provide high throughput with latency hiding by executing many warps in parallel. MSHR(Miss Status Holding Registers) for L1 data cache tracks cache miss requests until required data is serviced from lower level memory. In recent GPUs, excessive requests for cache resources cause underutilization problem of GPU resources due to cache resource reservation fails. In this paper, we propose a new warp scheduling technique to reduce stall cycles under MSHR resource shortage. Cache miss rates for each warp is predicted based on the observation that each warp shows similar cache miss rates for long period. The warps showing low miss rates or computation-intensive warps are given high priority to be issued when MSHR is full status. Our proposal improves GPU performance by utilizing cache resource more efficiently based on cache miss rate prediction and monitoring the MSHR entries. According to our experimental results, reservation fail cycles can be reduced by 25.7% and IPC is increased by 6.2% with the proposed scheduling technique compared to loose round robin scheduler.

Performance Analysis of Caching Instructions on SVLIW Processor and VLIW Processor (SVLIW 프로세서와 VLIW 프로세서의 명령어 캐싱에 따른 성능 분석)

  • Ji, Sung-Hyun;Park, No-Kwang;Kim, Suk-Il
    • Journal of IKEEE
    • /
    • v.1 no.1 s.1
    • /
    • pp.101-110
    • /
    • 1997
  • SVLIW processor architectures can resolve resource collisions and data dependencies between the instructions while scheduling VLIW instructions at run-time. As a result, long NOP word instructions can be removed from the object code produced for the processor. Thus, the occurrence of cache misses on the SVLIW processor would be lesser than that on the same cache size VLIW processor. Less frequent cache misses on the SVLIW processor would incur less frequent memory access, and thus, the total execution cycles to complete an application would be shortened compared with cases on the VLIW processor. Such a feature eventually compromises effects of longer instruction pipeline stages than those of the VLIW processor. In this paper, we formulate and compare two execution cycle models of the two architectures. A simulation results show that the longer memory access cycles when cache miss occurs, the total execution cycles of SVLIW processor would be shorter than those of VLIW processor.

  • PDF