• Title/Summary/Keyword: Cache-miss

Search Result 99, Processing Time 0.026 seconds

Radiation-Induced Soft Error Detection Method for High Speed SRAM Instruction Cache (고속 정적 RAM 명령어 캐시를 위한 방사선 소프트오류 검출 기법)

  • Kwon, Soon-Gyu;Choi, Hyun-Suk;Park, Jong-Kang;Kim, Jong-Tae
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.6B
    • /
    • pp.948-953
    • /
    • 2010
  • In this paper, we propose multi-bit soft error detection method which can use an instruction cache of superscalar CPU architecture. Proposed method is applied to high-speed static RAM for instruction cache. Using 1D parity and interleaving, it has less memory overhead and detects more multi-bit errors comparing with other methods. It only detects occurrence of soft errors in static RAM. Error correction is treated like a cache miss situation. When soft errors are occurred, it is detected by 1D parity. Instruction cache just fetch the words from lower-level memory to correct errors. This method can detect multi-bit errors in maximum 4$\times$4 window.

Improving Hit Ratio and Hybrid Branch Prediction Performance with Victim BTB (Victim BTB를 활용한 히트율 개선과 효율적인 통합 분기 예측)

  • Joo, Young-Sang;Cho, Kyung-San
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.10
    • /
    • pp.2676-2685
    • /
    • 1998
  • In order to improve the branch prediction accuracy and to reduce the BTB miss rate, this paper proposes a two-level BTB structure that adds small-sized victim BTB to the convetional BTB. With small cost, two-level BTB can reduce the BTB miss rate as well as improve the prediction accuracy of the hybrid branch prediction strategy which combines dynamic prediction and static prediction. Through the trace-driven simulation of four bechmark programs, the performance improvement by the proposed two-level BTB structure is analysed and validated. Our proposed BTB structure can improve the BTB miss rate by 26.5% and the misprediction rate by 26.75%

  • PDF

Increasing a Mobile Client's Cache Reusability in Wireless Client - Server Environments (무선 클라이언트-서버 환경에서 이동 클라이언트의 캐시 데이타 재사용율 향상기법)

  • Yi Song-Yi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.5
    • /
    • pp.282-296
    • /
    • 2006
  • In a wireless client server environment, data broadcasting is an efficient data dissemination method; a server broadcasts data, and some of broadcasted data are cached in a mobile client's cache to save the narrow communication bandwidth, limited resources, and data access time. A server also broadcasts invalidation reports to maintain the consistency between server data and a client's cached data. Most of existing works on the cache consistency problems simply purge the entire cache when the disconnection time is long enough to miss the certain amount(window size) of IRs. This paper presents a cache invalidation method to increase mobile clients' cache reusability in case of a long disconnection. Instead of simply dropping the entire cache regardless of its consistency, a client estimates the cost of purging all the data with the cost of selective purge. If the cost of dropping entire cache is higher, a client maintains the cache and selectively purge inconsistent data using uplink bandwidth for validation request. The simulation results show that this scheme increases the cache reusability since it effectively considers the update rates and the broadcast frequencies of cached data in estimating the cost of cache maintenance.

Analysis of Barrier Waiting Times in Data Parallel Programs (데이터 병렬 프로그램에서 배리어 대기시간의 분석)

  • Jung, In-Bum
    • Journal of Industrial Technology
    • /
    • v.21 no.A
    • /
    • pp.73-80
    • /
    • 2001
  • Barrier is widely used for synchronization in parallel programs. Since the process arrived earlier than others should wait at the barrier, the total processor utilization decreases. In this paper, to find the sources of the barrier waiting time, parallel programs are executed on the various grain sizes through execution-driven simulations. In simulation studies, we found that even if approximately equal amounts of work are distributed to each processor, all processes may not arrive at a barrier at the same time. The reasons are that the different numbers of cache misses and instructions within partitioned grains result in the difference in arrival time of processors at the barrier.

  • PDF

Efficient Algorithm for Query Processing of Aggregate functions in ROLAP Environment (ROLAP 환경에서 집단함수 질의처리를 위한 효율적인 알고리즘)

  • 김인식;김종겸;정순기
    • Journal of the Korea Society of Computer and Information
    • /
    • v.8 no.3
    • /
    • pp.40-46
    • /
    • 2003
  • The high-performance processors have recently employed sophisticated techniques to overlap and simultaneously execute multiple computation and memory operations. For the query processing of database management systems, those hardware characteristics are the important research issue. The latest works show that the cache miss penalty between main memory and CPU becomes new bottlenecks and the branch misprediction causes serious resource-waste. An effcient algorithm for query processing of aggregate functions considering these hardware characteristics was proposed in this dissertation.

  • PDF

David II: A new architecture for parallel rendering processors with effective memory system (David II: 효과적인 메모리 시스템을 가지는 병렬 렌더링 프로세서)

  • Lee, Kil-Whan;Park, Woo-Chan;Kim, Il-San;Han, Tack-Don
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.1655-1658
    • /
    • 2004
  • Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture, called DAVID II, resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. The experimental results show that DAVID II achieves almost linear speedup at best case even in sixteen rasterizers.

  • PDF

Design of a Load/store Unit for ARM-SMI Microprocessors (ARM-SMI용 Load/store Unit(LSU) 설계)

  • 김재억;이용석
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1387-1390
    • /
    • 2003
  • The superscalar architecture shows limit in performance improvement recently. While, SMT(Simultaneous Multi-Threading) architecture is receiving remark. The purpose of SMT architecture is to improve the performance of superscalar microprocessors by executing multi threads at the same time. In this paper, a load/store unit(LSU) suitable for ARM-compatible SMT microprocessors is presented. This LSU supports load instructions and store instructions of ARM ISA. This LSU keeps away the degradation of SMT by cache miss.

  • PDF

Cache Replacement Policy Based on Dynamic Counter for High Performance Processor (고성능 프로세서를 위한 카운터 기반의 캐시 교체 알고리즘)

  • Jung, Do Young;Lee, Yong Surk
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.4
    • /
    • pp.52-58
    • /
    • 2013
  • Replacement policy is one of the key factors determining the effectiveness of a cache. The LRU replacement policy has remained the standard for caches for many years. However, the traditional LRU has ineffective performance in zero-reuse line intensive workloads, although it performs well in high temporal locality workloads. To address this problem, We propose a new replacement policy; DCR(Dynamic Counter based Replacement) policy. A temporal locality of workload dynamically changes across time and DCR policy is based on the detection of these changing. DCR policy improves cache miss rate over a traditional LRU policy, by as much as 2.7% at maximum and 0.47% at average.

Performance Analysis of Caching Instructions on SVLIW Processor and VLIW Processor (SVLIW 프로세서와 VLIW 프로세서의 명령어 캐싱에 따른 성능 분석)

  • Ji, Sung-Hyun;Park, No-Kwang;Kim, Suk-Il
    • Journal of IKEEE
    • /
    • v.1 no.1 s.1
    • /
    • pp.101-110
    • /
    • 1997
  • SVLIW processor architectures can resolve resource collisions and data dependencies between the instructions while scheduling VLIW instructions at run-time. As a result, long NOP word instructions can be removed from the object code produced for the processor. Thus, the occurrence of cache misses on the SVLIW processor would be lesser than that on the same cache size VLIW processor. Less frequent cache misses on the SVLIW processor would incur less frequent memory access, and thus, the total execution cycles to complete an application would be shortened compared with cases on the VLIW processor. Such a feature eventually compromises effects of longer instruction pipeline stages than those of the VLIW processor. In this paper, we formulate and compare two execution cycle models of the two architectures. A simulation results show that the longer memory access cycles when cache miss occurs, the total execution cycles of SVLIW processor would be shorter than those of VLIW processor.

  • PDF

Low-Power Data Cache Architecture and Microarchitecture-level Management Policy for Multimedia Application (멀티미디어 응용을 위한 저전력 데이터 캐쉬 구조 및 마이크로 아키텍쳐 수준 관리기법)

  • Yang Hoon-Mo;Kim Cheong-Gil;Park Gi-Ho;Kim Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.13A no.3 s.100
    • /
    • pp.191-198
    • /
    • 2006
  • Today's portable electric consumer devices, which are operated by battery, tend to integrate more multimedia processing capabilities. In the multimedia processing devices, multimedia system-on-chips can handle specific algorithms which need intensive processing capabilities and significant power consumption. As a result, the power-efficiency of multimedia processing devices becomes important increasingly. In this paper, we propose a reconfigurable data caching architecture, in which data allocation is constrained by software support, and evaluate its performance and power efficiency. Comparing with conventional cache architectures, power consumption can be reduced significantly, while miss rate of the proposed architecture is very similar to that of the conventional caches. The reduction of power consumption for the reconfigurable data cache architecture shows 33.2%, 53.3%, and 70.4%, when compared with direct-mapped, 2-way, and 4-way caches respectively.