Browse > Article

Energy-Performance Efficient 2-Level Data Cache Architecture for Embedded System  

Lee, Jong-Min (한국과학기술원 전산학과)
Kim, Soon-Tae (한국과학기술원 전산학과)
Abstract
On-chip cache memories play an important role in both performance and energy consumption points of view in resource-constrained embedded systems by filtering many off-chip memory accesses. We propose a 2-level data cache architecture with a low energy-delay product tailored for the embedded systems. The L1 data cache is small and direct-mapped, and employs a write-through policy. In contrast, the L2 data cache is set-associative and adopts a write-back policy. Consequently, the L1 data cache is accessed in one cycle and is able to provide high cache bandwidth while the L2 data cache is effective in reducing global miss rate. To reduce the penalty of high miss rate caused by the small L1 cache and power consumption of address generation, we propose an ECP(Early Cache hit Predictor) scheme. The ECP predicts if the L1 cache has the requested data using both fast address generation and L1 cache hit prediction. To reduce high energy cost of accessing the L2 data cache due to heavy write-through traffic from the write buffer laid between the two cache levels, we propose a one-way write scheme. From our simulation-based experiments using a cycle-accurate simulator and embedded benchmarks, the proposed 2-level data cache architecture shows average 3.6% and 50% improvements in overall system performance and the data cache energy consumption.
Keywords
2-level data cache; Early cache hit predictor; One-way write;
Citations & Related Records
연도 인용수 순위
  • Reference
1 L. H. Lee, B. Meyer, and J. Arends, "Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications with Small Tight Loops," In Proc. International Symposium on Low Power Electronics and Design, pp.267-269, 1999.
2 G. Hinton, D.Sager, M, Upton, D. Boggs, D. Carmean, A. Kyker and P. Roussel, "The Microarchitecture of the Pentium 4 processor," Intel Technology Journal, 2001.
3 A. Chandrakasan, W. J. Bowhill and F. Fox, "Design of High-Performance Microprocessor Circuits," IEEE Press, 2001.
4 P. Shivakumar and N. P. jouppi, "CACTI 3.0: An Integrated Cache Timing, Power, and Area Model," WRL Research Report (Feb. 2001).
5 G. Contreras, M. Martonosi, J. Peng, R. Ju and G. Y. Lueh, "XTREM: A Power Simulator for the XScale Core," In Proc. ACM SIGPLAN/SIGBED Conference on Compilers, Architectures, and Synthesis, pp.115-125, 2004.
6 D. Brooks, V. Tiwari, M. Martonosi, "Wattch: a framework for architectural for architectural-level power analysis and optimizations," In Proc. International Symposium on High-Performance Computer Architecture, pp.83-94, 2000.
7 Oklahoma State University System on Chip (SoC) Design Flows, http://avatar.ecen.okstate.edu/projects/scells/
8 M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge and R. B. Brown, "MiBench: A free, commercially representative embedded benchmark suite," In Proc. 4th Annual Workshop on Workload Characterization, 2001. http://www.eecs.umich.edu/mibench/.
9 C. J. Fang, C. H. Huang, J. S. Wang and C. W. Yeh, "Fast and Compact Dynamic Ripple Carry Adder Design," ASIC, 2002.
10 M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy, "Reducing Set-Associative Cache Energy via Way-Prediction and Selective Direct-Mapping," In Proc. International symposium on Microarchitecture, pp.54-65, 2001.
11 T. Ishihara, F. Fallah, "A Non-Uniform Cache Architecture for Low Power System Design," In Proc. International Symposium on Low power Electronics and Design, pp.363-368, 2005.
12 T. M. Jones, S. Bartolini, B. D. Bus, J. Cavazos and Michael F.P. O'Boyle, "Instruction Cache Energy Saving Through Compiler Way-Placement," In Proc. Design, Automation and Test in Europe, pp.1196-1201, 2008.
13 A. Ma, M. Zhang and K. Asanovi, "Way memoization to reduce fetch energy in instruction caches," ISCA Workshop on Complexity Effective Design, 2001.
14 R. Min, W. B. Jone and Y. Hu, "Location Cache: A Low-Power L2 Cache System," In Proc. International Symposium on Low Power Electronics and Design, pp.120-125, 2004.
15 J. K. Peir, S. C. Lai, S. L. Lu, J. Stark and K. Lai, "Bloom Filtering Cache Misses Accurate Data Speculation and Prefetching," In Proc. International Conference on Supercomuting, pp.189-198, 2002.
16 R. E. Kessler, "The Alpha 21264 microprocessor," In Proc. IEEE MICRO, pp.24-36, April 1996.
17 T. M. Austin, D. N. Pnevmatikatos and G. S. Sohi, "Streamlining Data Cache Access with Fast Address Calculation," In Proc. International Symposium on Computer Architecture, pp.369-380, 1995.
18 T. M. Austin and G. S. Sohi, "Zero-Cycle Loads: Microarchitecture Support for Reducing Load Latency," In Proc. International Symposium on Microarchitecture, pp.82-92, 1995.
19 Advanced Digital Chips at http://www.adchips.co.kr.
20 J. L. Hennessy and D. A. Patterson, Computer Architecture A Quantitative Approach 3rd ed., Morgan Kaufmann Publisher, 2003.
21 N. P. Jouppi, "Cache Write Policies and Performance," In Proc. International symposium on Computer Architecture, pp.191-201, 1993.
22 A. Gordon-Ross, S. Cotterell, and F. Vahid, "Tiny instruction caches for low power embedded systems," ACM Transactions on Embedded Computing Systems, vol.2, Issue 4, pp.449-481, 2003.   DOI
23 J. Kin., M. Gupta., W. H. Mangione-Smith, "The Filter Cache: An Energy Efficient Memory Structure," In Proc. International Symposium on Microarchitecture pp.184-193, 1997.
24 W. Tang, R. Gupta, A. Nicolau, "Design a Predictive Filter Cache for Energy Savings in High Performance Processor Architecture," In Proc. International Conference on Computer Design, pp.68-75, 2001.
25 N. Bellas, I. Hajj, and C. Polychronopouos, "Using dynamic cache management techniques to reduce energy in a high-performance processor," In Proc. International Symposium on Low Power Electronics and Design, pp.64-69, 1999.
26 S. Segars, "Low Power Design Techniques for Microprocessors," ISSCC, Feb. 2001.