DOI QR코드

DOI QR Code

Instruction Flow based Early Way Determination Technique for Low-power L1 Instruction Cache

  • Kim, Gwang Bok (School of Electronics and Computer Engineering, Chonnam National University) ;
  • Kim, Jong Myon (School of Electrical Engineering, University of Ulsan) ;
  • Kim, Cheol Hong (School of Electronics and Computer Engineering, Chonnam National University)
  • Received : 2016.07.12
  • Accepted : 2016.08.28
  • Published : 2016.09.30

Abstract

Recent embedded processors employ set-associative L1 instruction cache to improve the performance. The energy consumption in the set-associative L1 instruction cache accounts for considerable portion in the embedded processor. When an instruction is required from the processor, all ways in the set-associative instruction cache are accessed in parallel. In this paper, we propose the technique to reduce the energy consumption in the set-associative L1 instruction cache effectively by accessing only one way. Gshare branch predictor is employed to predict the instruction flow and determine the way to fetch the instruction. When the branch prediction is untaken, next instruction in a sequential order can be fetched from the instruction cache by accessing only one way. According to our simulations with SPEC2006 benchmarks, the proposed technique requires negligible hardware overhead and shows 20% energy reduction on average in 4-way L1 instruction cache.

Keywords

References

  1. A. Sodani, and Processor, C. A. M, "Race to Exascale: Opportunities and Challenges," In MICRO 2011 Keynote, 2011.
  2. NVIDIA Tegra 4 Family CPU Architecture, NVIDIA, Tech. Rep., 2013. [Online]. Available: http://www.nvidia.com/docs/IO/116757/NVIDIA_Quad_a15_whitepaper_FINALv2.pdf".
  3. A. Sembrant, E. Hagersten, and D. Black-Shaffer, "TLC: A Tag Less Cache for Reducing Dynamic First Level Cache Energy," in Proc. of IEEE/ACM International Symposium on Microarchitecture, pp. 49-61, 2013.
  4. M. D. Powell, A. Agarwal, T. vijaykumar, B. Falsafi, and K. Roy, "Reducing Set-associative Cache Energy via Way-Prediction and Selective Direct-mapping," in MICRO, pp. 54-65, 2001.
  5. W. Zhang, H. Zhang, and J. Lach, "Reducing Dynamic Energy of Set-associative L1 Instruction Cache by Early Tag Lookup,", Low Power Electronics and Design, pp.49-54, 2015.
  6. J. Dai, M. Guan, and L. Wang, "Exploiting Early Tag Access for Reducing L1 data cache energy in embedded processors," IEEE Transactions on Very Large Scale Integration Systems, Vol. 22, NO. 2, pp.396-407, 2014. https://doi.org/10.1109/TVLSI.2013.2241088
  7. C. Zhang, F. Vahid, J. yang, and W. najjar, "A Way-Halting Cache for Low-Energy High-Performance Systems," ACM Transactions on Architecture and Code optimization, Vol. 2, No. 1, pp.34-54, 2005. https://doi.org/10.1145/1061267.1061270
  8. J. Dai, and L. Wang, "An Energy-Efficient L2 Cache Architecture using Way Tag Information under Write-through Policy," IEEE Transactions on Very Large Scale Integration Systems, Vol.21, No. 1, pp. 102-112, 2013. https://doi.org/10.1109/TVLSI.2011.2181879
  9. D. Sanchez, and C. Kozyrakis, "The ZCache: Decoupling Ways and Associativity," In Microarchitecture, pp. 187-198, 2010.
  10. A. Seznec, "A Case for Two-Way Skewed-Associative Caches," In ACM SIGARCH Computer Architecture News, Vol.21, No. 2, pp. 169-178, 1993.
  11. A. Seznec, and F. Bodin, "Skewed-Associative Caches," Parallel Architectures and Languages Europe, pp. 305-316, 1993.
  12. C.L Yang, and C. L., "Hotspot Cache: Joint Temporal and Spatial Locality Exploitation for I-cache Energy Reduction," Low Power Electronics and Design, pp. 114-119, 2004.
  13. J. Ye, H. Ding, Y. Hu, and T. Watanabe, "A Behavior-based Adaptive Access-Mode for Low-Power Set-Associative Caches in Embedded systems," Jornal of Information processing, Vol.20, No. 1, pp. 26-36, 2012. https://doi.org/10.2197/ipsjjip.20.26
  14. A. ma, M. Zhang and K. Asanovic, "Way Memorization to Reduce Fetch Energy in Instruction Caches," ISCA Workshop on Complexity Effective Design, Vol.20, pp. 31, 2001.
  15. C. H. Kim, S. W. Chung, and C. S Jhon, "A Power-aware Branch Predictor by Accessing BTB Selectively," Jornal of Computer Science and Technology, Vol.20, No.5, pp. 607-614, 2005. https://doi.org/10.1007/s11390-005-0607-y
  16. T. Austin, E., Larson, and D. Ernst, "SimpleScalar: An Infrastructure for Computer System Modeling," Computer, Vol.35, No.2, pp. 59-67, 2002. https://doi.org/10.1109/2.982917
  17. Wattch, http://www.eecs.harvard.edu/-dbrooks/
  18. SPEC Benchmark Suite. Information available at http://spec.org/cpu2006/
  19. SPEC CPU2000 Benchmarks, http://www.specbench.org
  20. N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, "CACTI 6.0: A Tool to Model Large Caches," Technical Report HPL-2009-85, Hewlett Packard Laboratories, 2009.