• Title/Summary/Keyword: Cache-miss

Search Result 99, Processing Time 0.028 seconds

Performance Evaluation and Analysis of Symmetric Multiprocessor using Multi-Program Benchmarks (Multi-Program 벤치마크를 이용한 대칭구조 Multiprocessor의 성능평가와 분석)

  • Jeong Tai-Kyeong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.4
    • /
    • pp.645-651
    • /
    • 2006
  • This paper discusses computer system performance evaluation and analysis by employing a simulator which able to execute a symmetric multiprocessor in machine simulation environment. We also perform a multiprocessor system analysis using SPLASH-2, which is a suite of multi-program benchmarks for multiprocessors, to perform the behavior study of the symmetric multiprocessor OS kernel, IRIX5.3. To validate the scalability of symmetric multiprocessor system, we demonstrate structure and evaluation methods for symmetric multiprocessor as well as a functionality-based software simulator, SimOS. In this paper, we examine cache miss count and stall time on the symmetric multiprocessor between the local instruction and local data, using the multi-program benchmarks such as RADIX sorting algorithm and Cholesky factorization.

An On-chip Cache and Main Memory Compression System Optimized by Considering the Compression rate Distribution of Compressed Blocks (압축블록의 압축률 분포를 고려해 설계한 내장캐시 및 주 메모리 압축시스템)

  • Yim, Keun-Soo;Lee, Jang-Soo;Hong, In-Pyo;Kim, Ji-Hong;Kim, Shin-Dug;Lee, Yong-Surk;Koh, Kern
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.1_2
    • /
    • pp.125-134
    • /
    • 2004
  • Recently, an on-chip compressed cache system was presented to alleviate the processor-memory Performance gap by reducing on-chip cache miss rate and expanding memory bandwidth. This research Presents an extended on-chip compressed cache system which also significantly expands main memory capacity. Several techniques are attempted to expand main memory capacity, on-chip cache capacity, and memory bandwidth as well as reduce decompression time and metadata size. To evaluate the performance of our proposed system over existing systems, we use execution-driven simulation method by modifying a superscalar microprocessor simulator. Our experimental methodology has higher accuracy than previous trace-driven simulation method. The simulation results show that our proposed system reduces execution time by 4-23% compared with conventional memory system without considering the benefits obtained from main memory expansion. The expansion rates of data and code areas of main memory are 57-120% and 27-36%, respectively.

An Optimized Cache Coherence Protocol in Multiprocessor System Connected by Slotted Ring (슬롯링으로 연결된 다중처리기 시스템에서 최적화된 캐쉬일관성 프로토콜)

  • Min, Jun-Sik;Chang, Tae-Mu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.12
    • /
    • pp.3964-3975
    • /
    • 2000
  • There are two policies for maintaining consistency among the multiple processor caches in a multiprocessor system: Write invalidate and Write update. In the write invalidate policy, whenever a processor attempt to write its cached block, it has to invalidate all the same copies of the updated block in the system. As a results of this frequent invalidations, this policy results in high cache miss ratio. On the other hand, the write update policy renew them, instead of invalidating all the same copies. This policy has to transfer the updated contents through interconnection network, whether the updated block is ptivate or not. Therefore the network suffer from heavy transaction traffic. In this paper we present an efficient cache coherence protocol for shared memory multiprocessor system connected by slotted ring. This protocol is based on the write update policy, but the updated contents are transferred only in case of updating the shared block. Otherwise, if the updated block is private, the updated contents are not transferred. We analyze the proposed protocol and enforce simulation to compare it with previous version.

  • PDF

An Efficient Address Mapping Table Management Scheme for NAND Flash Memory File System Exploiting Page Address Cache (페이지 주소 캐시를 활용한 NAND 플래시 메모리 파일시스템에서의 효율적 주소 변환 테이블 관리 정책)

  • Kim, Cheong-Ghil
    • Journal of Digital Contents Society
    • /
    • v.11 no.1
    • /
    • pp.91-97
    • /
    • 2010
  • Flash memory has been used by many digital devices for data storage, exploiting the advantages of non-volatility, low power, stability, and so on, with the help of high integrity, large capacity, and low price. As the fast growing popularity of flash memory, the density of it increases so significantly that its entire address mapping table becomes too big to be stored in SRAM. This paper proposes the associated page address cache with an efficient table management scheme for hybrid flash translation layer mapping. For this purpose, all tables are integrated into a map block containing entire physical page tables. Simulation results show that the proposed scheme can save the extra memory areas and decrease the searching time with less 2.5% of miss ratio on PC workload and can decrease the write overhead by performing write operation 33% out of total writes requested.

Design of a Parallel Rendering Processor Architecture with Effective Memory System (효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 설계)

  • Park Woo-Chan;Yoon Duk-Ki;Kim Kyoung-Su
    • The KIPS Transactions:PartA
    • /
    • v.13A no.4 s.101
    • /
    • pp.305-316
    • /
    • 2006
  • Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. For the above two goals, effective memory organizations including a new pixel cache architecture are presented. The experimental results show that the proposed architecture achieves almost linear speedup at best case even in sixteen rasterizers.

A Study on Improvement of Buffer Cache Performance for File I/O in Deep Learning (딥러닝의 파일 입출력을 위한 버퍼캐시 성능 개선 연구)

  • Jeongha Lee;Hyokyung Bahn
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.2
    • /
    • pp.93-98
    • /
    • 2024
  • With the rapid advance in AI (artificial intelligence) and high-performance computing technologies, deep learning is being used in various fields. Deep learning proceeds training by randomly reading a large amount of data and repeats this process. A large number of files are randomly repeatedly referenced during deep learning, which shows different access characteristics from traditional workloads with temporal locality. In order to cope with the difficulty in caching caused by deep learning, we propose a new sampling method that aims at reducing the randomness of dataset reading and adaptively operating on existing buffer cache algorithms. We show that the proposed policy reduces the miss rate of the buffer cache by 16% on average and up to 33% compared to the existing method, and improves the execution time by up to 24%.

A Cache buffer and Read Request-aware Request Scheduling Method for NAND flash-based Solid-state Disks (캐시 버퍼와 읽기 요청을 고려한 낸드 플래시 기반 솔리드 스테이트 디스크의 요청 스케줄링 기법)

  • Bang, Kwanhu;Park, Sang-Hoon;Lee, Hyuk-Jun;Chung, Eui-Young
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.8
    • /
    • pp.143-150
    • /
    • 2013
  • Solid-state disks (SSDs) have been widely used by high-performance personal computers or servers due to its good characteristics and performance. The NAND flash-based SSDs, which take large portion of the whole NAND flash market, are the major type of SSDs. They usually integrate a cache buffer which is built from DRAM and uses the write-back policy for better performance. Unfortunately, the policy makes existing scheduling methods less effective at the I/F level of SSDs Therefore, in this paper, we propose a scheduling method for the I/F with consideration of the cache buffer. The proposed method considers the hit/miss status of cache buffer and gives higher priority to the read requests. As a result, the requests whose data is hit on the cache buffer can be handled in advance and the read requests which have larger effects on the whole system performance than write requests experience shorter latency. The experimental results show that the proposed scheduling method improves read latency by 26%.

Dynamic Prefetch Filtering Schemes to enhance Utilization of Data Cache (데이타 캐시의 활용도를 높이는 동적 선인출 필터링 기법)

  • Chon, Young-Suk;Kim, Suk-Il;Jeon, Joong-Nam
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.1
    • /
    • pp.30-43
    • /
    • 2008
  • Memory reference instructions such as loads or stores are critical factors that limit the processing power of processor. The prefetching technique is an effective way to reduce the latency caused from memory access. However, excessively aggressive prefetch leads to cache pollution so as to cancel out the advantage of prefetch. In this study, four filtering schemes have been compared and evaluated which dynamically decide whether to begin prefetch after referring a filtering table to decrease cache pollution. First, A bi-states scheme has been shown to analyze the lock problem of the conventional scheme, this scheme such as conventional scheme used to be N:1 mapping, but it has the two state to 1bit value of each entries. A complete state scheme has been introduced to be used as a reference for the comparative study. A block address lookup scheme has been proposed as the main idea of this paper which exhibits the most exact filtering performance. This scheme has a length of the table the same as the bi-states scheme, the contents of each entry have the fields the same as the complete state scheme recently, never referenced data block address has been 1:1 mapping a entry of the filter table. Experimental results from commonly used general benchmarks and multimedia programs show that average cache miss ratio have been decreased by 10.5% for the block address lookup scheme(BAL) compare to conventional dynamic filter scheme(2-bitSC).

Instructions and Data Prefetch Mechanism using Displacement History Buffer (변위 히스토리 버퍼를 이용한 명령어 및 데이터 프리페치 기법)

  • Jeong, Yong Su;Kim, JinHyuk;Cho, Tae Hwan;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.10
    • /
    • pp.82-94
    • /
    • 2015
  • In this paper, we propose hardware prefetch mechanism with an efficient cache replacement policy by giving priority to the trigger block in which a spatial region and producing a spatial region by using the displacement field. It could be taken into account the sequence of the program since a history is based on the trigger block of history record, and it could be quickly prefetching the instructions or data address by adding a stored value to the trigger address and displacement field since a history is stored as a displacement value. Also, we proposed a method of replacing at random by the cache replacement policy from the low priority block when the cache area is full after giving priority to the trigger block. We analyzed using the memory simulator program gem5 and PARSEC benchmark to assess the performance of the hardware prefetcher. As a result, compared to the existing hardware prefecture to generate the spatial region using a bit vector, L1 data cache miss rate was reduced about 44.5% on average and an average of 26.1% of L1 instruction misses occur. In addition, IPC (Instruction Per Cycle) showed an improvement of about 23.7% on average.

Caching Scheme Considering Access Patterns in Graph Environments (그래프 환경에서 접근 패턴을 고려한 캐싱 기법)

  • Yoo, Seunghun;Kim, Minsoo;Bok, Kyoungsoo;Yoo, Jaesoo
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2017.05a
    • /
    • pp.19-20
    • /
    • 2017
  • 최근 소셜 미디어와 센서 장비의 기술의 발달로 그래프 데이터의 양이 급격히 증가 하였다. 그래프 데이터의 처리 과정에서 I/O 비용이 발생하여 데이터가 많아지면 병목현상으로 인해 데이터의 처리와 관리에 있어 성능에 한계가 발생한다. 이러한 문제를 해결하기 위해 데이터를 메모리에서 관리하는 캐시 기법에 대한 연구가 이루어 졌다. 본 논문에서는 서브그래프 데이터의 접근 패턴을 고려한 캐싱 기법을 제안한다. 그래프 환경에서 그래프 질의 이력을 통해 패턴을 찾고 질의 관리 테이블과 FP(frequent pattern)-Tree 통해 선별된 데이터를 메모리에 적재시킨다. 또한, 캐시 실패(cache miss)가 발생 하였을 때, 주변의 이웃 정점을 같이 메모리에 적재시킨다. 메모리가 가득 찰 경우 캐시 된 데이터를 퇴출시키는 교체 전략을 제안한다.

  • PDF