• Title/Summary/Keyword: cache performance model

Search Result 57, Processing Time 0.034 seconds

Analytical Models and their Performance Analysis of Superscalar Processors (수퍼스칼라 프로세서의 해석적 모델 및 성능 분석)

  • Kim, Hak-Jun;Kim, Seon-Mo;Choe, Sang-Bang
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.7
    • /
    • pp.847-862
    • /
    • 1999
  • 본 논문에서는 유한버퍼의(finite-buffered) 동기화된(synchronous) 큐잉모델(queueing model)을 이용하여 명령어들간의 병렬성, 분기명령의 빈도수, 분기예측(branch prediction)의 정확도, 캐쉬미스 등의 파라미터들을 고려하여 프로세서의 명령어 실행율을 예측하며 캐쉬의 성능과 파이프라인 성능간의 관계를 분석할 수 있는 새로운 해석적 모델을 제안하였다. 해석적 모델은 모델의 타당성을 검증하기 위해서 시뮬레이션을 수행하여 얻은 결과와 비교하였다. 해석적 모델과 시뮬레이션을 비교한 결과 대부분 10% 오차 내에서 일치하였다. 본 연구를 통하여 얻은 해석적 모델을 사용하면 시뮬레이션에서는 드러나지 않는 성능제약의 원인에 대한 명확한 규명이 가능하기 때문에 성능향상을 위한 설계자료를 얻을 수 있으며, 시스템 성능 밸런스를 위한 캐쉬와 비순차이슈 파이프라인 성능간의 관계에 대한 정확한 분석이 가능하다.Abstract This research presents a novel analytic model to predict the instruction execution rate of superscalar processors using the queuing model with finite-buffer size and synchronous operation mode. The proposed model is also able to analyze the performance relationship between cache and pipeline. The proposed model takes into account various kinds of architectural parameters such as instruction-level parallelism, branch probability, the accuracy of branch prediction, cache miss, and etc.. To prove the correctness of the model, we performed extensive simulations and compared the results with the analytic model. Simulation results showed that the proposed model can estimate the average execution rate accurately within 10% error compared to simulation results. The proposed model can explain the causes of performance bottleneck which cannot be uncovered by the simulation method only. The model is also able to show the effect of the cache miss on the performance of out-of-order issue superscalar processors, which can provide an valuable information in designing a balanced system.

Performance Analysis of Parity Cache enabled RAID Level 5 for DDR Memory Storage Device (패리티 캐시를 이용한 DDR 메모리 저장 장치용 RAID 레벨 5의 성능 분석)

  • Gu, Bon-Gen;Kwak, Yun-Sik;Cheong, Seung-Kook;Hwang, Jung-Yeon
    • Journal of Advanced Navigation Technology
    • /
    • v.14 no.6
    • /
    • pp.916-927
    • /
    • 2010
  • In this paper, we analyze the performance of the parity cache enabled RAID level-5 via the simulation. This RAID system consists of the DDR memory-based storage devices. To do this, we develop the simulation model and suggest the basic performance analysis data which we want to get via the simulation. And we implement the simulator based on the simulation model and execute the simulator. From the result of the simulation, we expect that the parity cache enabled RAID level-5 configured by the DDR memory based storage devices has the positive effectiveness to the enhancing of the storage system performance if the storage access patterns of applications are tuned.

A Cache Hoarding Method Using Collaborative Filtering in Mobile Computing Environments (모바일 컴퓨팅 환경에서 협업추천 모형을 이용한 캐시 적재 기법)

  • Jun, Sung-Hae;Jung, Sung-Won;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.6
    • /
    • pp.687-692
    • /
    • 2004
  • In this paper, we proposed an efficient cache hoarding method in mobile computing environments using collaborative filtering. This method is used for solving the difficult problem of mobile computing, which is the vacuum of information service depending on low bandwidth, long delay, and frequent network disconnection. Many previous researches have been studied a cache hoarding approach for solving these problems of mobile client. But, the research of history information of mobile client did not support all informative requests for mobile clients. In our research, collaborative filtering model using history information and location data of mobile client is proposed. This proposed model supports an efficient service of necessary items for client's requirement. For the performance evaluation of proposed model, we make an experiment of simulation data using SAS enterprise miner. According to objective evaluation using cache hit ratio, we show that our model has a good result.

Cache Algorithm in Reverse Connection Setup Protocol(CRCP) for effective Location Management in PCS Network (PCS 네트워크 상에서 효율적인 위치관리를 위한 역방향 호설정 캐쉬 알고리즘(CRCP)에 관한 연구)

  • Ahn, Yun-Shok;An, Seok;Bae, Yun-Jeong;Jo, Jea-Jun;Kim, Jae-Ha;Kim, Byung-Gi
    • Proceedings of the KIEE Conference
    • /
    • 1998.11b
    • /
    • pp.630-632
    • /
    • 1998
  • The basic user location strategies proposed in current PCS(Personal Communication Services) Network are two-level Database strategies. These Databases which exist in the Signalling network always maintain user's current location information, and it is used in call setup process to a mobile user. As the number of PCS users are increasing, this strategies yield some problem such as concentrating signalling traffic on the Database, increasing Call setup Delay, and so on. In this paper, we proposed RCP(Reverse Connection setup Protocol) model, which apply RVC(Reverse Virtual Call setup) algorithm to PCS reference model, and CRCP(Cache algorithm in RCP) model, which adopt Caching strategies in the RCP model. When Cache-miss occur, we found that CRCP model require less miss-penalty than PCS model. Also we show that proposed models are always likely to yield better performance in terms of reduced Location Tracking Delay time.

  • PDF

Low-Power Cache Design by using Locality Buffer and Address Compression (지역 버퍼와 주소 압축을 통한 저전력 캐시 설계)

  • Kwak, Jong Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.9
    • /
    • pp.11-19
    • /
    • 2013
  • Most modern computer systems employ cache systems in order to alleviate the access time gap between processor and memory system. The power dissipated by the cache systems becomes a significant part of the total power dissipated by whole microprocessor chip. Therefore, power reduction in the cache system becomes one of the important issues. Partial tag cache is the system for the least power consumption. The main power reduction for this method is due to the use of small partial tag matching, not full tag matching. In this paper, we first analyze the previous regular partial tag cache systems and propose a new address matching mechanism by using locality buffer and address compression. In simulation results, the proposed model shows 18% power reduction in average, still providing same performance level, compared to regular cache.

Performance Evaluation of Deferrd Locking for Maintaining Transactional Cache Consistency (트랜잭션 캐쉬 일관성을 유지하기 위한 지연 로킹 기법의 성능 평가)

  • Kwon, Hyeok-Min
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.8
    • /
    • pp.2310-2326
    • /
    • 2000
  • Client-server DBMS based on a data-shipping model can exploit e1ient resources effectively by allowing inter-transaction caching. However, inter-transaction caching raises the need of transactional cache consistency maintenancetTCCM protocol. since each client is able to cache a portion of the database dynamically. Deferred locking(DL) is a new detection-based TCCM scheme designed on the basis of a primary copy locking algorithm. In DL, a number of lock ,ujuests and a data shipping request are combined into a single message packet to minimize the communication overhead required for consistency checking. Lsing a simulation model. the performance of the prolxlsed scheme is compared with those of two representative detection based schemes, the adaptive optimistic concurrency control and the caching two-phase locking. The performance results indicate that DL improves the overall system throughput with a reasonable transaction abort ratio over other detection - based schemes.

  • PDF

2Q-CFP: A Client Cache Management Scheme for Broadcast-based Information Systems (2Q-CFP: 방송에 기초한 정보 시스템을 위한 클라이언트 캐쉬 관리 기법)

  • 권혁민
    • Journal of KIISE:Databases
    • /
    • v.30 no.6
    • /
    • pp.561-572
    • /
    • 2003
  • Broadcast-based data delivery has attracted a lot of attention as an efficient way of disseminating data to very large client populations. The main motivation of broadcast-based information systems (BBISs) is that the number of clients that they serve can grow arbitrarily large without any effect on their performance. The performance of BBISs depends mainly on client caching strategies and on data broadcast scheduling mechanisms. This paper addresses the former issue and proposes a new client cache management scheme, named 2Q-CFP, that is suitable to BBISs. This paper also evaluates the performance of 2Q-CFP on the basis of a simulation model. The performance results indicate that 2Q-CFP scheme shows superior performances over GRAY, LRU and CF in the average response time.

Gated Recurrent Unit based Prefetching for Graph Processing (그래프 프로세싱을 위한 GRU 기반 프리페칭)

  • Shivani Jadhav;Farman Ullah;Jeong Eun Nah;Su-Kyung Yoon
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.2
    • /
    • pp.6-10
    • /
    • 2023
  • High-potential data can be predicted and stored in the cache to prevent cache misses, thus reducing the processor's request and wait times. As a result, the processor can work non-stop, hiding memory latency. By utilizing the temporal/spatial locality of memory access, the prefetcher introduced to improve the performance of these computers predicts the following memory address will be accessed. We propose a prefetcher that applies the GRU model, which is advantageous for handling time series data. Display the currently accessed address in binary and use it as training data to train the Gated Recurrent Unit model based on the difference (delta) between consecutive memory accesses. Finally, using a GRU model with learned memory access patterns, the proposed data prefetcher predicts the memory address to be accessed next. We have compared the model with the multi-layer perceptron, but our prefetcher showed better results than the Multi-Layer Perceptron.

  • PDF

Cache Sensitive T-tree Main Memory Index for Range Query Search (범위질의 검색을 위한 캐시적응 T-트리 주기억장치 색인구조)

  • Choi, Sang-Jun;Lee, Jong-Hak
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.10
    • /
    • pp.1374-1385
    • /
    • 2009
  • Recently, advances in speed of the CPU have for out-paced advances in memory speed. Main-memory access is increasingly a performance bottleneck for main-memory database systems. To reduce memory access speed, cache memory have incorporated in the memory subsystem. However cache memories can reduce the memory speed only when the requested data is found in the cache. We propose a new cache sensitive T-tree index structure called as $CST^*$-tree for range query search. The $CST^*$-tree reduces the number of cache miss occurrences by loading the reduced internal nodes that do not have index entries. And it supports the sequential access of index entries for range query by connecting adjacent terminal nodes and internal index nodes. For performance evaluation, we have developed a cost model, and compared our $CST^*$-tree with existing CST-tree, that is the conventional cache sensitive T-tree, and $T^*$-tree, that is conventional the range query search T -tree, by using the cost model. The results indicate that cache miss occurrence of $CST^*$-tree is decreased by 20~30% over that of CST-tree in a single value search, and it is decreased by 10~20% over that of $T^*$-tree in a range query search.

  • PDF

Multicore Real-Time Scheduling to Reduce Inter-Thread Cache Interferences

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.1
    • /
    • pp.67-80
    • /
    • 2013
  • The worst-case execution time (WCET) of each real-time task in multicore processors with shared caches can be significantly affected by inter-thread cache interferences. The worst-case inter-thread cache interferences are dependent on how tasks are scheduled to run on different cores. Therefore, there is a circular dependence between real-time task scheduling, the worst-case inter-thread cache interferences, and WCET in multicore processors, which is not the case for single-core processors. To address this challenging problem, we present an offline real-time scheduling approach for multicore processors by considering the worst-case inter-thread interferences on shared L2 caches. Our scheduling approach uses a greedy heuristic to generate safe schedules while minimizing the worst-case inter-thread shared L2 cache interferences and WCET. The experimental results demonstrate that the proposed approach can reduce the utilization of the resulting schedule by about 12% on average compared to the cyclic multicore scheduling approaches in our theoretical model. Our evaluation indicates that the enhanced scheduling approach is more likely to generate feasible and safe schedules with stricter timing constraints in multicore real-time systems.