• 제목/요약/키워드: Cache-miss

Search Result 99, Processing Time 0.041 seconds

An Analysis of Multi-processor System Performance Depending on the Input/Output Types (입출력 형태에 따른 다중처리기 시스템의 성능 분석)

  • Moon, Wonsik
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.12 no.4
    • /
    • pp.71-79
    • /
    • 2016
  • This study proposes a performance model of a shared bus multi-processor system and analyzes the effect of input/output types on system performance and overload of shared resources. This system performance model reflects the memory reference time in relation to the effect of input/output types on shared resources and the input/output processing time in relation to the input/output processor, disk buffer, and device standby places. In addition, it demonstrates the contribution of input/output types to system performance for comprehensive analysis of system performance. As the concept of workload in the probability theory and the presented model are utilized, the result of operating and analyzing the model in various conditions of processor capability, cache miss ratio, page fault ratio, disk buffer hit ratio (input/output processor and controller), memory access time, and input/output block size. A simulation is conducted to verify the analysis result.

A new direct-mapped cache with fully associative buffer for low power consumption by using bank-selection mechanism (저 전력을 위한 뱅크 선택 메커니즘과 새로운 동작 메커니즘을 이용한 직접사상 캐쉬 및 버퍼 시스템)

  • 이종성;이정훈;김신덕
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.223-225
    • /
    • 2003
  • 본 논문은 서로 다른 두 구조의 캐쉬와 새로운 뱅크선별기를 이용하여, 보다 효율적인 뱅크관리 메커니즘을 응용한 새로운 개념의 캐쉬 구조에 대한 설명을 한다. 크기가 작음에도 불구하고, 낮은 접근 실패율(Miss ratio)와 높은 저전력 효과는 기존의 일반적인 직접사상 캐쉬와 비교했을 때, 성능면에서 월등한 차이를 나타내고 있다. 이러한 결과의 원인은 직접사상 캐쉬와 완전연관 버퍼의 최적화된 구성과. 효과적인 뱅크선별기를 사용하여 적은 전력에도 높은 성능을 발휘하는 새로운 메커니즘을 사용하였기 때문이다. 제안한 구조의 성능은 다양한 크기의 직접사상 캐쉬와 비교하였으며, 접근 실패율, 평균 메모리 접근 시간, 전력소비, Energy * Delay Product 등 모두 4가지의 지표를 사용하였다.

  • PDF

Reconsidering Performance Measurement when Non-Volatile RAM is used in the Buffer Cache (차세대 비휘발성 메모리가 추가된 버퍼캐쉬에서 성능 측정 방법의 재조명)

  • Lee Kyuhyung;Choi Jongmoo;Lee Donghee;Noh SamH.
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07a
    • /
    • pp.793-795
    • /
    • 2005
  • 영속적인 데이터 저장이 가능한 차세대 비휘발성 메모리를 휘발성 메모리와 혼용하여 버퍼캐처로 사용하면, 안정성과 성능향상의 효과를 얻을 수 있다. 본 연구에서는 기존의 연구에서 제시한 캐처관리 정책을 시뮬레이터를 이용하여 실험하고 실험 결과를 분석하여 비휘발성 메모리가 추가된 캐처의 새로운 특성을 밝혀냈다. 비휘발성 메모리가 캐쉬에 포함되면 읽기 쓰기의 요청의 종류, 미스(miss)되었을 경우 캐쉬될 블록의 더티(dirty)여부, 읽기 요청이 적중(hit)되었을 때, 적중된 블록의 메모리 종류에 따라 각각의 요청을 처리하기 위한 디스크 접근횟수가 달라지는 특성을 나타낸다. 이 특성 때문에 비휘발성 메모리가 추가된 버퍼캐처는 적중률(hit rate) 보다는 디스크 접근횟수를 측정하는 것이 정확한 성능측정을 가능하게 한다.

  • PDF

Efficient Cache Management Scheme in Database based on Block Classification (블록 분류에 기반한 데이타베이스의 효율적 캐쉬 관리 기법)

  • Sin, Il-Hoon;Koh, Kern
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.7
    • /
    • pp.369-376
    • /
    • 2002
  • Although LRU is not adequate for database that has non-uniform reference pattern, it has been adopted in most database systems due to the absence of the proper alternative. We analyze database block reference pattern with the realistic database trace. Based on this analysis, we propose a new cache replacement policy. Trace analysis shows that extremely non-popular blocks take up about 70 % of the entire blocks. The influence of recency on blocks' re-reference likelihood is at first strong due to temporal locality, however, it rapidly decreases and eventually becomes negligible as stack distance increases. Based on this observation, RCB(Reference Characteristic Based) cache replacement policy, which we propose in this paper, classifies the entire blocks into four block groups by blocks' recency and re-reference likelihood, and operates different priority evaluation methods for each block group. RCB policy evicts non-popular blocks more quickly than the others and evaluates the priority of the block by frequency that has not been referenced for a long time. In a trace-driven simulation, RCB delivers a better performance than the existing polices(LRU, 2Q, LRU-K, LRFU). Especially compared to LRU. It reduces miss count by 5~l2.7%. Time complexity of RCB is O(1), which is the same with LRU and 2Q and superior to LRU-K(O(log$_2$N)) and LRFU(O(l) ~ O(log$_2$N)).

An Efficient Caching Strategy in Data Broadcasting (데이터 방송 환경에서의 효율적인 캐슁 정책)

  • Kim, Su-Yeon;Choe, Yang-Hui
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.12
    • /
    • pp.1476-1484
    • /
    • 1999
  • TV 방송 분야에서 다양한 정보와 상호 작용성을 제공하기 위해서 최근 기존 방송 내용인 A/V 스트림 외 부가정보 방송이 시도되고 있다. 데이타 방송에 대한 기존 연구는 대부분 고정된 내용의 데이타를 방송하는 환경을 가정하고 있어서 그 결과가 방송 내용의 변화가 많은 환경에 부적합하다. 본 논문에서는 데이타에 대한 접근이 반복되지 않을 가능성이 높고 사용자 접근 확률을 예상하기 어려운 상황에서 응답 시간을 개선하는 방안으로 수신 데이타를 무조건 캐쉬에 반입하고 교체가 필요한 경우 다음 방송 시각이 가장 가까운 페이지를 축출하는 사용자 단말 시스템에서의 캐슁 정책을 제안하였다. 제안된 캐쉬 관리 정책은 평균적인 캐쉬 접근 실패 비용을 줄임으로써 사용자 응답 시간을 개선하며, 서로 다른 스케줄링 기법을 사용하는 다양한 방송 제공자가 공존하는 환경에서 보편적으로 효과를 가져올 수 있다.Abstract Recently, many television broadcasters have tried to disseminate digital multimedia data in addition to the traditional content (audio-visual stream). The broadcast data need to be cached by a client system, to provide a reasonable response time for a user request. Previous studies assumed the dissemination of a fixed set of items, and the results are not suitable when broadcast items are frequently changed. In this paper, we propose a novel cache management scheme that chooses the replacement victim based on the remaining time to the next broadcast instance. The proposed scheme reduces response time, where it is hard to predict the probability distribution of user accesses. The caching policy we present here significantly reduces expected response time by minimizing expected cache miss penalty, and can be applied without difficulty to different scheduling algorithms.

A architecture for parallel rendering processor with by effective memory organization (효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 구조)

  • Kim, Kyung-Su;Yoon, Duk-Ki;Kim, Il-San;Park, Woo-Chan
    • Journal of Korea Game Society
    • /
    • v.5 no.3
    • /
    • pp.39-47
    • /
    • 2005
  • Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simulaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. The experimental results show that proposed architecture achieves almost linear speedup at best case even in sixteen rasterizer

  • PDF

The Instruction Flash memory system with the high performance dual buffer system (명령어 플래시 메모리를 위한 고성능 이중 버퍼 시스템 설계)

  • Jung, Bo-Sung;Lee, Jung-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.1-8
    • /
    • 2011
  • NAND type Flash memory has performing much researches for a hard disk substitution due to its low power consumption, cheap prices and a large storage. Especially, the NAND type flash memory is using general buffer systems of a cache memory for improving overall system performance, but this has shown a tendency to emphasize in terms of data. So, our research is to design a high performance instruction NAND type flash memory structure by using a buffer system. The proposed buffer system in a NAND flash memory consists of two parts, i.e., a fully associative temporal buffer for branch instruction and a fully associative spatial buffer for spatial locality. The spatial buffer with a large fetching size turns out to be effective serial instructions, and the temporal buffer with a small fetching size can achieve effective branch instructions. According to the simulation results, we can reduce average miss ratios by around 77% and the average memory access time can achieve a similar performance compared with the 2-way, victim and fully associative buffer with two or four sizes.

Proposal of 3D Graphic Processor Using Multi-Access Memory System (Multi-Access Memory System을 이용한 3D 그래픽 프로세서 제안)

  • Lee, S-Ra-El;Kim, Jae-Hee;Ko, Kyung-Sik;Park, Jong-Won
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.4
    • /
    • pp.119-128
    • /
    • 2019
  • Due to the nature of the 3D graphics processor system, many mathematical calculations are required and parallel processing research using GPU (Graphics Processing Unit) is being performed for high-speed processing. In this paper, we propose a 3D graphics processor using MAMS, a parallel processor that does not use cache memory, to solve the GPU problem of increasing bandwidth caused by cache memory miss and the problem that 3D shader processing speed is not constant. The 3D graphics processor using MAMS proposed in this paper designed Vertex shader, Pixel shader, Tiling and Rasterizing structure using DirectX command analysis, the FPGA(Xilinx Virtex6@100MHz) board for MAMS was constructed and designed using Verilog. We compared the processing time of the developed FPGA (100Mhz) and nVidia GeForce GTX 660 (980Mhz), the processing time using GTX 660 was not constant and suing MAMS was constant.

A Dynamic Prefetch Filtering Schemes to Enhance Usefulness Of Cache Memory (캐시 메모리의 유용성을 높이는 동적 선인출 필터링 기법)

  • Chon Young-Suk;Lee Byung-Kwon;Lee Chun-Hee;Kim Suk-Il;Jeon Joong-Nam
    • The KIPS Transactions:PartA
    • /
    • v.13A no.2 s.99
    • /
    • pp.123-136
    • /
    • 2006
  • The prefetching technique is an effective way to reduce the latency caused memory access. However, excessively aggressive prefetch not only leads to cache pollution so as to cancel out the benefits of prefetch but also increase bus traffic leading to overall performance degradation. In this thesis, a prefetch filtering scheme is proposed which dynamically decides whether to commence prefetching by referring a filtering table to reduce the cache pollution due to unnecessary prefetches In this thesis, First, prefetch hashing table 1bitSC filtering scheme(PHT1bSC) has been shown to analyze the lock problem of the conventional scheme, this scheme such as conventional scheme used to be N:1 mapping, but it has the two state to 1bit value of each entries. A complete block address table filtering scheme(CBAT) has been introduced to be used as a reference for the comparative study. A prefetch block address lookup table scheme(PBALT) has been proposed as the main idea of this paper which exhibits the most exact filtering performance. This scheme has a length of the table the same as the PHT1bSC scheme, the contents of each entry have the fields the same as CBAT scheme recently, never referenced data block address has been 1:1 mapping a entry of the filter table. On commonly used prefetch schemes and general benchmarks and multimedia programs simulates change cache parameters. The PBALT scheme compared with no filtering has shown enhanced the greatest 22%, the cache miss ratio has been decreased by 7.9% by virtue of enhanced filtering accuracy compared with conventional PHT2bSC. The MADT of the proposed PBALT scheme has been decreased by 6.1% compared with conventional schemes to reduce the total execution time.

Performance Improvement of a VLIW ARchitecture without Pipeline-Stall during Instruction Cache Miss (명령어 캐시미스중에서도 파이프라인의 고착을 피할 수 있는 VLIW 구조의 성능향상)

  • Ji, Seung-Hyeon;Park, No-Gwang;Kim, Seok-Il
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.3
    • /
    • pp.301-312
    • /
    • 1999
  • 본 논문에서는 명령어 수준의 병렬성을 다루는 세 가지 프로세서 모델을 정의하고 각 모델별로 명령어 파이프라인을 운용하는 방법에 다른 실행사이클의 변화를 연구하였다. 본 논문에서 고려한 세가지 모델은1) 긴 명령어 인출시 캐시미스가 발생하면 명령어 파이프라인이 정지되는 전통적인 VLIW 구조, 2) 전통적인 VLIW 구조와 같이 긴 명령어 인출시 캐시미스가 발생하면 명령어 파이프라인이 정지되나 실시간에 긴 명령어를 실행 유니트로 스케줄링할 수있으므로 목적 코드에서 LNOP를 제거할 수 있는 구조 및 3)2)의 구조에서 긴 명령어를 인출하는 과정에서 캐시미스가 발생하더라도 LNOP을 분석 유니트로 제공하여 명령어 파이프라인을 계속 진행시키는 구조의 세 가지이다. 연구결과, 세 번째 구조에서 발생되는 LNOP 의 수는 첫 번째 구조와 두 번째 구조에 비하여 적어서 동일한 응용 프로그램을 처리하는데 필요한 실행사이클의 수가 가장 짧았다. 여러 가지 벤치 마크들에 대한 모의 실험에서도 세 번째 구조가 다른 구조의 프로세서에 비하여 실행사이클의 수가 가장 짧음을 확인할 수 있었다.