An L1 Cache Prefetching Scheme using Excessively Aggressive Prefetchering and a Small Direct-mapped Filtering Cache

공격적인 선인출 및 직접 사상 필터링을 이용한 L1 캐시 선인출 기법

  • 전영숙 (충북대학교 컴퓨터과학과)
  • Published : 2006.10.15

Abstract

This paper proposes an L1 cache prefetch scheme using an excessively aggressive hardware prefetcher and a hardware prefetch filter having a small direct-mapped filtering cache. A quantitative analysis method has been introduced and applied to analyze nonideal effects of aggressive cache prefetching. From those analysis results, the structure and algorithm of a prefetch filter has been derived and simulated, and the overall system performance has been measured using a cycle-by-cycle cache simulator. Experimental results show that the proposed scheme improves the overall system performance by 18% on the average over several benchmarks

본 논문에서는 공격적인 선인출 및 직접 사상 필터링을 이용한 L1 캐시 선인출 기법을 제안한다. 이를 위하여 캐시 선인출의 역효과에 대한 정량적 분석 방법을 제안하였고 이를 이용하여 다양한 벤치마크에서의 공격적 선인출 효과를 분석하였다. 분석 결과를 바탕으로 최적 선인출 필터 구조 및 알고리즘을 도출하였고 독자적으로 개발된 타이밍 기반 캐시 시뮬레이터를 사용하여 전체 시스템 성능을 추출하였다. 실험 결과는 제안된 L1 선인출 기법을 사용하여 다양한 벤치마크에 대하여 시스템 성능을 평균적으로 18% 향상시킬 수 있음을 보인다.

Keywords

References

  1. D. Callahan, K. Kennedy, and A. Porterfield, 'Software Prefetching,' in Proc: Fourth Int. Conf. Architectural Support for Programming Languages and Operating Systems, pp. 40-52, Apr. 1991 https://doi.org/10.1145/106972.106979
  2. C.-K. Luk and T. Mowry, 'Compiler Based Prefetching for Recursive Data Structures,' in Proc: Seventh Int. Conf. Architectural Support for Programming Languages and Operating Systems, pp. 222-233, Oct. 1996 https://doi.org/10.1145/237090.237190
  3. A. J. Smith, 'Cache Memories,' ACM Computing Surveys, Vol. 14, No. 3, pp. 473-530, Sep. 1982 https://doi.org/10.1145/356887.356892
  4. J. D. Gindele, 'Buffer Block Prefetching Method,' IBM Technical Disclosure Bull., vol. 20, no. 2, pp. 696-697, July 1977
  5. R. Cucchiara, M. Piccardi and A. Prati, 'Hardware Prefetching Technique for Cache Memories in Multimedia Applications,' in Proc. IEEE Intl. Workshop on Computer Architectures for Machine Perception (CAMP), 2000 https://doi.org/10.1109/CAMP.2000.875990
  6. N. P. Jouppi, 'Improving Directed-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,' in Proc. of the 17th Annual International Symposium on Computer Architecture, pp. 364-373, May 1990
  7. J. L. Baer and T.-F. Chen, 'An Effective On-chip Preloading Scheme to Reduce Data Access Penalty,' in Proc. of Supercomputing '91, pp. 176-186, Nov. 1991 https://doi.org/10.1145/125826.125932
  8. T.-F. Chen and J-L Baer, 'Effective Hardware-Based data prefetching for High-Performance Processors,' IEEE Trans. Computers, Vol. 44, No. 5, pp. 609-623, May 1995 https://doi.org/10.1109/12.381947
  9. J. Pomerene, T. Puzak, R. Rechtschaffen and F. Sparacio, 'Prefetching System for a Cache Having a Second Directory for Sequentially Accessed Blocks,' US Patent 4,807,110, Feb. 1989
  10. M. Charney and T. Puzak, 'Prefetching and Memory System Behavior of the SPEC95 Benchmark Suite,' IBM J. Research and Development, vol. 41, no. 3, pp. 265-286, May 1997 https://doi.org/10.1147/rd.413.0265
  11. D. Joseph and D. Grunwald, 'Prefetching Using Markov Predictors,' IEEE Trans. on computers, Vol. 48, No 2, Feb. 1999 https://doi.org/10.1109/12.752653
  12. J. Kim, K. V. Palem and W-F. Wong, 'A Framework for Data Prefetching using Off-line Training of Markovian Predictors,' in Proc. IEEE Int. Conf. on Computer Design (ICCD), pp. 340-347, Sep. 2002 https://doi.org/10.1109/ICCD.2002.1106792
  13. G. Hariprakash, R. Achutharaman, A. R. Omondi, 'DSTRIDE: Data-Cache Miss-Address-Based Stride Prefetching Scheme for Multimedia Processors,' 6th Australasian Computer Systems Architecture Conference (AustCSAC'01), pp. 62-70, Jan. 29-30, 2001 https://doi.org/10.1109/ACAC.2001.903360
  14. Y. Solihin, J. Lee and J. Torrellas, 'Correlation Prefetching with a User-Level Memory Thread,' IEEE Trans. Computers, Vol. 14, No. 6, June 2003
  15. V. Srinivasan, G. Tyson and E. Davidson, 'A Static Filter for Reducing Prefetch Traffic,' Technical Report CSE-TR-400-99, University of Michigan, 1999
  16. V. Srinivasan, E. S. Davidson and G. S. Tyson, 'A Prefetch Taxonomy,' IEEE Trans. Computers, Vol. 53, No. 2, pp. 126-140, Feb. 2004 https://doi.org/10.1109/TC.2004.1261824
  17. X. Zhuang and H-H S. Lee, 'Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches,' in Proc. IEEE Int. Conf. on Parallel Processing, pp.286-293, Oct. 2003 https://doi.org/10.1109/ICPP.2003.1240591
  18. O. Mutlu, H. Kim, D. N. Armstrong and Y. N. Part, 'Cache Filtering Techniques to Reduce the Negative Impact of Useless Speculative Memory References on Processor Performance,' in Proc. 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'04), pp. 2-9, 2004 https://doi.org/10.1109/SBAC-PAD.2004.11
  19. P. G. Emma, A. Hartstein, T. R. Puzak and V. Srinivasan, 'Exploring the Limits of Prefetching,' IBM J. Research and Development, Vol. 49, No. 2-3, pp. 127-144, Jan. 2005 https://doi.org/10.1147/rd.491.0127
  20. A. Srivastava and A. Eustace, 'ATOM: A System for Building Customized Program Analysis Tools,' in Proc. ACM SIGPLAN 94, pp. 196-205, 1994 https://doi.org/10.1145/178243.178260
  21. Y. Ruan, V. S. Pai, E. Nahum and J. M. Tracey, 'Evaluating the Impact of Simultaneous Multi-threading on Network Servers using Real Hardware,' in Proc. ACM Int. Conf. on Measurement and Modeling of Computer Systems, pp. 315-326, 2005 https://doi.org/10.1145/1064212.1064254
  22. J. H. Lee, S. W. Jeong, S. D. Kim and C. C. Weems, 'An Intelligent Cache System with Hardware Prefetching for High Performance,' IEEE Trans. on computers, Vol. 52, No 5, May. 2003 https://doi.org/10.1109/TC.2003.1197127
  23. 전영숙, 문현주, 김석일, 전중남, '단속적 불규칙 주소 간격을 갖는 멀티미디어 데이터를 위한 하드웨어 캐시 선인출 방법', 정보과학회논문지, 제31권, 제11호, pp.658-672.2004