내장형 시스템을 위한 에너지-성능 측면에서 효율적인 2-레벨 데이터 캐쉬 구조의 설계

Energy-Performance Efficient 2-Level Data Cache Architecture for Embedded System

  • 이종민 (한국과학기술원 전산학과) ;
  • 김순태 (한국과학기술원 전산학과)
  • 투고 : 2010.01.20
  • 심사 : 2010.07.02
  • 발행 : 2010.10.15

초록

온칩(on-chip) 캐쉬는 외부 메모리로의 접근을 감소시키며 빈번하게 접근되기 때문에 내장형 시스템의 성능과 에너지 소비 측면에서 중요한 역할을 한다. 본 논문에서는 내장형 시스템에 맞추어 설계된 2-레벨 데이터 캐쉬 메모리 구조를 제안하고자 한다. 레벨1(L1) 캐쉬의 구성으로 작은 크기, 직접시장(direct-mapped) 그리고 바로쓰기(write-through)를 채용한다. 대조적으로 레벨2(L2) 캐쉬는 보통의 캐쉬 크기와 집합연관(set-associativity) 그리고 나중쓰기(write-back) 정책을 채용한다. 결과적으로 L1 캐쉬는 빠른 접근 시간을 가지며 (한 사이클 이내) L2 캐쉬는 전체 캐쉬의 미스율(global miss rate)을 낮추는데 효과적이다. 작은 크기의 L1 데이터 캐쉬로 인한 증가된 캐쉬 미스율(miss rate)을 줄이기 위해 ECP(Early Cache hit Predictor)기법을 제안하였다. 제안된 ECP기법은 L1 캐쉬 히트 예측을 통해서 요청된 데이터가 L1 캐쉬에 있는지 예측할 수 있으며 추가적으로, ALU를 필요로 하지 않고 빠르게 유효주소(effective address)계산을 할 수 있다. 또한, 두 캐쉬 계층간 바로쓰기(write-through) 정책에서 오는 빈번한 L2 캐쉬 접근으로 인한 에너지 소비를 줄이기 위해 지정웨이 쓰기(one-way write) 기법을 제안하였다. 제안된 지정웨이 쓰기 기법을 이용하면 바로쓰기 정책으로 인한 L1 캐쉬에서 L2 캐쉬로의 쓰기 접근시 태그(tag) 비교 과정을 거치지 않고 하나의 지정된 웨이를 바로 접근할 수 있다. 사이클 단위 정확도의 시뮬레이터와 내장형 벤치마크를 이용한 실험 결과 본 논문에서 제안한 2-레벨 데이터 캐쉬 메모리 구조는 평균적으로 3.6%의 성능향상과 50%의 데이터 캐쉬 에너지 소비를 감소 시켰다.

On-chip cache memories play an important role in both performance and energy consumption points of view in resource-constrained embedded systems by filtering many off-chip memory accesses. We propose a 2-level data cache architecture with a low energy-delay product tailored for the embedded systems. The L1 data cache is small and direct-mapped, and employs a write-through policy. In contrast, the L2 data cache is set-associative and adopts a write-back policy. Consequently, the L1 data cache is accessed in one cycle and is able to provide high cache bandwidth while the L2 data cache is effective in reducing global miss rate. To reduce the penalty of high miss rate caused by the small L1 cache and power consumption of address generation, we propose an ECP(Early Cache hit Predictor) scheme. The ECP predicts if the L1 cache has the requested data using both fast address generation and L1 cache hit prediction. To reduce high energy cost of accessing the L2 data cache due to heavy write-through traffic from the write buffer laid between the two cache levels, we propose a one-way write scheme. From our simulation-based experiments using a cycle-accurate simulator and embedded benchmarks, the proposed 2-level data cache architecture shows average 3.6% and 50% improvements in overall system performance and the data cache energy consumption.

키워드

참고문헌

  1. S. Segars, "Low Power Design Techniques for Microprocessors," ISSCC, Feb. 2001.
  2. J. L. Hennessy and D. A. Patterson, Computer Architecture A Quantitative Approach 3rd ed., Morgan Kaufmann Publisher, 2003.
  3. N. P. Jouppi, "Cache Write Policies and Performance," In Proc. International symposium on Computer Architecture, pp.191-201, 1993.
  4. J. Kin., M. Gupta., W. H. Mangione-Smith, "The Filter Cache: An Energy Efficient Memory Structure," In Proc. International Symposium on Microarchitecture pp.184-193, 1997.
  5. W. Tang, R. Gupta, A. Nicolau, "Design a Predictive Filter Cache for Energy Savings in High Performance Processor Architecture," In Proc. International Conference on Computer Design, pp.68-75, 2001.
  6. N. Bellas, I. Hajj, and C. Polychronopouos, "Using dynamic cache management techniques to reduce energy in a high-performance processor," In Proc. International Symposium on Low Power Electronics and Design, pp.64-69, 1999.
  7. L. H. Lee, B. Meyer, and J. Arends, "Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications with Small Tight Loops," In Proc. International Symposium on Low Power Electronics and Design, pp.267-269, 1999.
  8. A. Gordon-Ross, S. Cotterell, and F. Vahid, "Tiny instruction caches for low power embedded systems," ACM Transactions on Embedded Computing Systems, vol.2, Issue 4, pp.449-481, 2003. https://doi.org/10.1145/950162.950163
  9. M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy, "Reducing Set-Associative Cache Energy via Way-Prediction and Selective Direct-Mapping," In Proc. International symposium on Microarchitecture, pp.54-65, 2001.
  10. T. Ishihara, F. Fallah, "A Non-Uniform Cache Architecture for Low Power System Design," In Proc. International Symposium on Low power Electronics and Design, pp.363-368, 2005.
  11. T. M. Jones, S. Bartolini, B. D. Bus, J. Cavazos and Michael F.P. O'Boyle, "Instruction Cache Energy Saving Through Compiler Way-Placement," In Proc. Design, Automation and Test in Europe, pp.1196-1201, 2008.
  12. A. Ma, M. Zhang and K. Asanovi, "Way memoization to reduce fetch energy in instruction caches," ISCA Workshop on Complexity Effective Design, 2001.
  13. R. Min, W. B. Jone and Y. Hu, "Location Cache: A Low-Power L2 Cache System," In Proc. International Symposium on Low Power Electronics and Design, pp.120-125, 2004.
  14. J. K. Peir, S. C. Lai, S. L. Lu, J. Stark and K. Lai, "Bloom Filtering Cache Misses Accurate Data Speculation and Prefetching," In Proc. International Conference on Supercomuting, pp.189-198, 2002.
  15. T. M. Austin, D. N. Pnevmatikatos and G. S. Sohi, "Streamlining Data Cache Access with Fast Address Calculation," In Proc. International Symposium on Computer Architecture, pp.369-380, 1995.
  16. T. M. Austin and G. S. Sohi, "Zero-Cycle Loads: Microarchitecture Support for Reducing Load Latency," In Proc. International Symposium on Microarchitecture, pp.82-92, 1995.
  17. Advanced Digital Chips at http://www.adchips.co.kr.
  18. R. E. Kessler, "The Alpha 21264 microprocessor," In Proc. IEEE MICRO, pp.24-36, April 1996.
  19. G. Hinton, D.Sager, M, Upton, D. Boggs, D. Carmean, A. Kyker and P. Roussel, "The Microarchitecture of the Pentium 4 processor," Intel Technology Journal, 2001.
  20. A. Chandrakasan, W. J. Bowhill and F. Fox, "Design of High-Performance Microprocessor Circuits," IEEE Press, 2001.
  21. P. Shivakumar and N. P. jouppi, "CACTI 3.0: An Integrated Cache Timing, Power, and Area Model," WRL Research Report (Feb. 2001).
  22. G. Contreras, M. Martonosi, J. Peng, R. Ju and G. Y. Lueh, "XTREM: A Power Simulator for the XScale Core," In Proc. ACM SIGPLAN/SIGBED Conference on Compilers, Architectures, and Synthesis, pp.115-125, 2004.
  23. D. Brooks, V. Tiwari, M. Martonosi, "Wattch: a framework for architectural for architectural-level power analysis and optimizations," In Proc. International Symposium on High-Performance Computer Architecture, pp.83-94, 2000.
  24. Oklahoma State University System on Chip (SoC) Design Flows, http://avatar.ecen.okstate.edu/projects/scells/
  25. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge and R. B. Brown, "MiBench: A free, commercially representative embedded benchmark suite," In Proc. 4th Annual Workshop on Workload Characterization, 2001. http://www.eecs.umich.edu/mibench/.
  26. C. J. Fang, C. H. Huang, J. S. Wang and C. W. Yeh, "Fast and Compact Dynamic Ripple Carry Adder Design," ASIC, 2002.