DOI QR코드

DOI QR Code

고성능 데이터 캐시 메모리 구조

High Performance Data Cache Memory Architecture

  • 발행 : 2008.08.31

초록

공간적 지역성(spatial locality) 및 시간적 지역성(temporal locality)을 동시에 향상시킬 수 있는 새로운 고성능 데이터 캐시 구조를 제안한다. 제안된 캐시 메모리는 하드웨어 프리패치 유닛과 큰 블록 크기를 갖는 직접사상(DM: direct mapped) 캐시와 작은 블록 크기를 갖는 완전 사상(FA: fully associative) 캐시의 하위 캐시 유닛으로 구성된다. 공간적 지역성은 블록 데이터를 패치하여 직접 사상 캐시에 저장함으로써 보장되며, DM 캐시 히트가 발생한 경우에 그 이웃 데이터 블록을 프리패치 함으로써 최적화 된다. 시간적 지역성은 작은 블록 데이터가 DM 캐시로부터 제거 될때 그 블록의 과거 기록에 따라서 중요한 데이터는 완전사상 캐시에 저장함으로써 보장된다. Spec2000 벤치 마크 프로그램에 대한 실험 결과에 의하면 제안된 캐시 구조는 비슷한 크기의 직접사상 캐쉬, 4웨이 연관사상(4 way set associative cache) 및 SMI(selective-mode intelligent cache) 캐쉬 [8]등의 기존의 구조에 비해서 미스 비율(miss rate)을 평균적으로 $12.53\sim23.62%$ 그리고 AMAT(average memory access time)를 평균적으로 $14.67\sim18.60%$ 줄일 수 있음을 증명하였다.

In this paper, a new high performance data cache scheme that improves exploitation of both the spatial and temporal locality is proposed. The proposed data cache consists of a hardware prefetch unit and two sub-caches such as a direct-mapped (DM) cache with a large block size and a fully associative buffer with a small block size. Spatial locality is exploited by fetching and storing large blocks into a direct mapped cache, and is enhanced by prefetching a neighboring block when a DM cache hit occurs. Temporal locality is exploited by storing small blocks from the DM cache in the fully associative buffer according to their activity in the DM cache when they are replaced. Experimental results on Spec2000 programs show that the proposed scheme can reduce the average miss ratio by $12.53%\sim23.62%$ and the AMAT by $14.67%\sim18.60%$ compared to the previous schemes such as direct mapped cache, 4-way set associative cache and SMI(selective mode intelligent) cache[8].

키워드

참고문헌

  1. B. Juurlink, “Unified Dual Data Caches,” Proceedings of the Euromicro Symposium on Digital System Design, 2003, pp. 33-40
  2. Norman P. Jouppi, “Improving Direct-Mapped CachePerformance by the Addition of a Small Fully Associative Cache and Prefetch Buffers,” Proceedings of 17th ISCA, May. 1990, pp. 364-373.
  3. D. Stiliadis and A. Varma, “Selective Victim Caching: A Method to Improve the Performance of Direct Mapped Cache,” IEEE Transactions on Computers, Vol 46, No. 5, May 1997, pp. 603-610. https://doi.org/10.1109/12.589235
  4. A. Gonzalez, C. Aliagas, and M. Valero, “A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality,” Proceedings of International Conference on Supercomputing, 1995, pp. 338-347.
  5. B. Juurlink, “Unified Dual Data Caches,” Proc. The Euromicro Symposium on Digital System Design, 2003. pp. 33-40.
  6. V. Milutinovic, M. Tomasevic, B. Markovic, and M. Tremblay, “A New Cache Architecture Concept: The Split Temporal/Spatial Cache,” 8th Mediterranean Electro-technical Conference, vol.2, 1996, pp. 1108-1111.
  7. J. H. Lee, J. S. Lee, and S. D. Kim, “A new cache architecture based on temporal and spatial locality,” Journal of System Architecture, Vol. 46, Sep. 2000, pp. 1451-1467. https://doi.org/10.1016/S1383-7621(00)00035-7
  8. J. H. Lee, S.-W. Jeong, S. D. Kim, and C.C.Weems, “An Intelligent Cache System with Hardware Prefetching for High Performance,” IEEE Transactions on Computers, Vol. 52, No. 5, 2003, pp. 607 - 616. https://doi.org/10.1109/TC.2003.1197127
  9. T. Mowry, M. S. Lam, and A. Gupta, “Design and evaluation of a compiler algorithm for prefetching,” Proceedomgs of 5th International Conference on Architectural Support for programming Languages and Operating Systems, 1992, pp. 62-73. https://doi.org/10.1145/143365.143488
  10. A.K. Porterfield. “Software Methods for Improvement of Cache Performance on Supercomputer Application,” PhD dissertation, Rice Univ. 1989.
  11. W. Y. Chen, S. A. Mahlke, P. P. Chang, and W. M. Hwu, “Data Access Microarchitectures for Superscalar Processors with Compiler-assisted Data Prefetching,” Proceedings of 24th Annual Workshop on Microprogramming and Microarchitectures, 1991.
  12. A.J. Smith, “Cache Memories,” Computing Surveys, vol. 14, no. 3, 1982, pp. 473-530. https://doi.org/10.1145/356887.356892
  13. J.D. Gindele, “Buffer Block Prefetching Method,” IBM Technical Disclosure Bulletin, Vol. 20, no. 2, 1977, pp. 696-697
  14. D. Zucker, M.J. Flynn, and R. Lee, “A Comparison of Hardware Prefetcing Techniques for Multimedia Benchmark,” Proceedings of IEEE Multimedia, 1996, pp. 236-244. https://doi.org/10.1109/MMCS.1996.534981
  15. D. Burger and T.M. Austin, The SimpleScalar tool set, version 2.0, Technical Report TR-97-1342, University of Wisconsin-Madison, 1997.