• 제목/요약/키워드: Cache Performance

검색결과 656건 처리시간 0.03초

Dual Cache Architecture for Low Cost and High Performance

  • Lee, Jung-Hoon;Park, Gi-Ho;Kim, Shin-Dug
    • ETRI Journal
    • /
    • 제25권5호
    • /
    • pp.275-287
    • /
    • 2003
  • We present a high performance cache structure with a hardware prefetching mechanism that enhances exploitation of spatial and temporal locality. Temporal locality is exploited by selectively moving small blocks into the direct-mapped cache after monitoring their activity in the spatial buffer. Spatial locality is enhanced by intelligently prefetching a neighboring block when a spatial buffer hit occurs. We show that the prefetch operation is highly accurate: over 90% of all prefetches generated are for blocks that are subsequently accessed. Our results show that the system enables the cache size to be reduced by a factor of four to eight relative to a conventional direct-mapped cache while maintaining similar performance.

  • PDF

Memory Latency Penalty를 개선한 SIMT 기반 Stream Processor의 Memory Operation System Architecture 설계 (An Implementation of a Memory Operation System Architecture for Memory Latency Penalty Reduction in SIMT Based Stream Processor)

  • 이광엽
    • 전기전자학회논문지
    • /
    • 제18권3호
    • /
    • pp.392-397
    • /
    • 2014
  • 본 논문은 Memory Latency Penalty를 개선한 SIMT Architecture 기반 Stream Processor의 Memory Operation System Architecture를 제안한다. 제안하는 구조는 Non-Blocking Cache Architecture를 적용하여 기존의 Blocking Cache Architecture에서 발생하는 Cache Miss Penalty를 개선하였고 다양한 알고리즘의 처리속도를 비교하여 제안하는 Memory Operation System Architecture를 적용한 Stream Processor의 성능 향상을 검증하였다. 실험은 각 알고리즘의 Memory 명령어의 비율에 따라 향상된 성능을 측정하여 Stream Processor의 성능이 최소 8.2%에서 최대 46.5%까지 향상됨을 확인하였다.

Bitmap-based Prefix Caching for Fast IP Lookup

  • Kim, Jinsoo;Ko, Myeong-Cheol;Nam, Junghyun;Kim, Junghwan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제8권3호
    • /
    • pp.873-889
    • /
    • 2014
  • IP address lookup is very crucial in performance of routers. Several works have been done on prefix caching to enhance the performance of IP address lookup. Since a prefix represents a range of IP addresses, a prefix cache shows better performance than an IP address cache. However, not every prefix is cacheable in itself. In a prefix cache it causes false hit to cache a non-leaf prefix because there is possibly the longer matching prefix in the routing table. Prefix expansion techniques such as complete prefix tree expansion (CPTE) make it possible to cache the non-leaf prefixes as the expanded forms, but it is hard to manage the expanded prefixes. The expanded prefixes sometimes incur a great deal of update overhead in a routing table. We propose a bitmap-based prefix cache (BMCache) to provide low update overhead as well as low cache miss ratio. The proposed scheme does not have any expanded prefixes in the routing table, but it can expand a non-leaf prefix using a bitmap on caching time. The trace-driven simulation shows that BMCache has very low miss ratio in spite of its low update overhead compared to other schemes.

An Efficient Cache Management Scheme of Flash Translation Layer for Large Size Flash Memory Drives

  • Choi, Hwan-Pil;Kim, Yong-Seok
    • 한국컴퓨터정보학회논문지
    • /
    • 제20권11호
    • /
    • pp.31-38
    • /
    • 2015
  • Nowadays, large size flash memory drives with more than a couple of hundreds of gigabytes are common. This paper presents an efficient cache management scheme of flash translation layer, called TPC-FTL, for large size flash memory drives. Since flash drives of large size usually contain large size RAM, we can enhance the performance of page mapping cache by using more RAM for the cache. But if the size exceeds a threshold, the existing schemes are impractical for real devices, because the time for cache manipulation becomes too long. TPC-FTL manages the cache in translation page unit, not in logical page number unit used in existing schemes. Since a translation page covers a large number of logical page numbers (for example, 512 for 2KB size page), the number of cache elements can be reduced up to a practical level. A performance evaluation shows that average response time, an important performance measure, is better than existing schemes via the effect of utilizing spacial locality in addition to temporal locality.

트랜잭셔널 메모리를 위한 효율적인 캐시 구조 (Efficient Cache Architecture for Transactional Memory)

  • 최동민;김승훈;노원우
    • 전자공학회논문지CI
    • /
    • 제48권4호
    • /
    • pp.1-8
    • /
    • 2011
  • 트랜잭셔널 메모리 시스템에서 오버플로우(overflow) 발생 시 이를 처리하기 위한 데이터의 기록은 그 복잡성으로 인해 전체 시스템 성능 저하의 주요 요인이 된다. 특히, 오버플로우 된 데이터가 일으킬 수 있는 충돌감지를 위해 캐시 일관성 프로토콜 상에 추가적인 상태 설정이 요구되며 이로 인해 트랜잭션간 커뮤니케이션에 지연이 발생한다. 이러한 문제점을 해결하기 위해 우리는 트랜잭셔널 메모리 시스템에서 오버플로우에 의해 발생하는 오버 헤드를 줄이기 위한 효율적인 캐시 구조를 연구하였다. 본 논문에서 제안하는 보조 캐시(supportive cache)는 1차 캐시와 동일한 교체 정책을 사용하며 병렬 룩업이 가능하도록 작동한다. 보조 캐시의 성능 평가를 위해 하드웨어 트랜잭셔널 메모리 시스템인 LogTM-SE를 사용하였으며 시뮬레이션 결과 평균적으로 37%의 성능 향상을 보였다.

캐시를 고려한 T-트리 인덱스 구조 (Cache Sensitive T-tree Index Structure)

  • 이익훈;김현철;허재녕;이상구;심준호;장준호
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제32권1호
    • /
    • pp.12-23
    • /
    • 2005
  • 지난 10년간 CPU의 속도는 메모리의 속도에 비해 급속한 속도로 발전하였다 그 결과 데이터 베이스 시스템을 포함한 다른 컴퓨터 응용분야에서 메모리의 접근이 병목현상을 일으키게 되었다. 메모리의 접근 속도를 줄이기 위해 캐시 메모리가 도입되었다 하지만 캐시 메모리는 원하는 데이타가 캐시에 옮겨져 있어야 메모리 접근 속도를 줄일 수 있다. 때문에 응용프로그램에서 데이타를 어떤 순서로 액세스 하느냐에 따라 캐시의 활용도가 달라지고 응용프로그램의 성능이 달라지게 된다. 이 시점에서 현재 컴퓨터에서 B+-트리가 T-트리보다 더 빠르다는 사실이 알려졌다. B+-트리가 T-트리보다 캐시를 더 효율적으로 사용하기 때문이다. 또한 B+-트리를 개선하여 캐시를 더욱 효율적으로 사용하는 CSB+-트리(Cache Sensitive B+-tree)가 제안되기도 하였다 본 논문의 목표는 T-트리가 캐시를 효율적으로 사용하도록 새로운 T-트리 구조를 개발하는 것이다. CSB+-트리와 같이 시스템의 L2 캐시를 최대한 활용하며 기존 T-트리가 가지는 장점을 가지는 새로운 CST-트리(Cache Sensitive T-트리)를 설계 개발하고, 실험을 통해 기타 다른 인덱스 구조에 비교하여 CST-트리의 우수성을 보인다.

페이지 정렬을 이용한 효과적인 동의어 문제 해결 기법에 관한 연구 (A Study on an Efficient Solution to the Synonym Problem using Page Alignment)

  • 김제성;민상렬;전상훈;안병철;정덕균;김종상
    • 전자공학회논문지B
    • /
    • 제33B권2호
    • /
    • pp.37-46
    • /
    • 1996
  • This paper proposes a cost-effective solution to the synonym problem of virtual caches. In the proposed solution, a minimal hardware addition guarantees the correctness whereas the software counterpart helps improve the performance. The key to this proposed solution is an addition of a small physically-indexed cache called U-cache. The U-cache maintains the reverse translation information of the cache blocks that belong to unaligned virtual pages only, where aligned measns that the lower bits of the virtual page number match those of the corresponding physical page number. The page alignment is a simple software optimization to improve the performance of the U-cche hardware. With the combination of both hardware and software, the proposed solution reduces the hardware costs and minimizes software modification and performance degradation. Performance evaluation base on ATUM traces shows that a U-cache, with only a few entries, performs almost as well as fully-configured hardware-based solution when more than 95% of the pages are aligned.

  • PDF

Study of Cache Performance on GPGPU

  • Choi, Kyu Hyun;Kim, Seon Wook
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제4권2호
    • /
    • pp.78-82
    • /
    • 2015
  • General-purpose graphics processing units (GPGPUs) provide tremendous computational and processing power. Despite the latency hiding mechanism, a GPU architecture requires high memory bandwidth and lower latency between computational units and the memory system. For this reason, the current GPU architecture has private L1 caches in each core and a shared L2 cache to increase performance by reducing memory latency. But in some cases, this CPU-like cache design is not suitable for GPGPUs. In this paper, we analyze detailed cache performance related to GPGPU application characteristics, and suggest technical alternatives for the GPGPU architecture as future work.

Workload Characteristics-based L1 Data Cache Switching-off Mechanism for GPUs

  • Do, Thuan Cong;Kim, Gwang Bok;Kim, Cheol Hong
    • 한국컴퓨터정보학회논문지
    • /
    • 제23권10호
    • /
    • pp.1-9
    • /
    • 2018
  • Modern graphics processing units (GPUs) have become one of the most attractive platforms in exploiting high thread level parallelism with the support of new programming tools such as CUDA and OpenCL. Recent GPUs has applied cache hierarchy to support irregular memory access patterns; however, L1 data cache (L1D) exhibits poor efficiency in the GPU. This paper shows that the L1D does not always positively affect the applications in terms of performance and energy efficiency for the GPU. The performance of the GPU is even harmed by using the L1D for lots of applications. Our proposed technique exploits the characteristics of the currently-executed applications to predict the performance impact of the L1D on the GPU and then decides whether to continuously use the cache for the application or not. Our experimental results show that the proposed technique improves the GPU performance by 9.4% and saves up to 52.1% of the power consumption in the L1D.

효율적인 버퍼 캐시 관리를 위한 동적 캐시 분할 블록교체 기법 (Dynamic Cache Partitioning Strategy for Efficient Buffer Cache Management)

  • 진재선;허의남;추현승
    • 한국시뮬레이션학회논문지
    • /
    • 제12권2호
    • /
    • pp.35-44
    • /
    • 2003
  • The effectiveness of buffer cache replacement algorithms is critical to the performance of I/O systems. In this paper, we propose the degree of inter-reference gap (DIG) based block replacement scheme that retains merits of the least recently used (LRU) such as simple implementation and good cache hit ratio (CHR) for general patterns of references, and improves CHR further. In the proposed scheme, cache blocks with low DIGs are distinguished from blocks with high DIGs and the replacement block is selected among high DIGs blocks as done in the low inter-reference recency set (LIRS) scheme. Thus, by having the effect of the partitioning the cache memory dynamically based on DIGs, CHR is improved. Trace-driven simulation is employed to verified the superiority of the DIG based scheme and shows that the performance improves up to about 175% compared to the LRU scheme and 3% compared to the LIRS scheme for the same traces.

  • PDF