Browse > Article
http://dx.doi.org/10.5626/KTCP.2017.23.9.564

Improving Instruction Cache Performance by Dynamic Management of Cache-Image  

Suh, Hyo-Joong (Catholic Univ. of Korea)
Publication Information
KIISE Transactions on Computing Practices / v.23, no.9, 2017 , pp. 564-571 More about this Journal
Abstract
The burst loading of a pre-created cache-image is an effective method to reduce the instruction cache misses in the early stage of the program execution. It is useful to alleviate the performance degradation as well as the energy inefficiency, which is induced by the concentrated cold misses at the instruction cache. However, there are some defects, including software overhead on the compiler and installer. Furthermore, there are several mismatches as a result of the dynamic properties for specific applications. This paper addresses these issues and proposes a cache-image maintenance/recreation policy that can conduct dynamic management using a hardware-assisted method. The results of the simulation show that the proposed method can maintain the cache-image with a proper size and validity.
Keywords
cache-image loading; dynamic image generation; instruction cache; latency reducing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. Grunwald, C. B. Morrey, III, P. Levis, M. Neufeld, and K. I. Farkas, "Policies for dynamic clock scheduling," Proc. of the 4th Conf. Symp. Operating System Design & Implementation, Vol. 4, No. 6, 2000.
2 J. Pouwelse, K. Langendoen, H. Sips, "Dynamic voltage scaling on a low-power microprocessor," Proc. of the Intl. Conf. Mobile computing and Networking, pp. 251-259, 2001.
3 H.J. Suh, T. Kim, "Burst Loading Method of Instruction Cache Image for Program Latency Reduction and Energy Saving," Journal of KIISE : Computing Practices and Letters, Vol. 19, No. 4, pp. 163-170, 2013. (in Korean)
4 S.Y. Hwang, H.J. Suh, "Program Latency Reduction and Energy Saving by Way-Selective Cache Image Pre-Loading of Instruction Cache," Journal of KIISE : Computing Practices and Letters, Vol. 20, No. 3, pp. 121-130, 2014. (in Korean)
5 G. Semeraro, G. Magklis, R. Balasubramonian, D.H. Albonesi, S. Dwarkadas, M. L. Scott, "Energy- efficient processor design using multiple clock domains with dynamic voltage and frequency scaling," Proc. of the 8th Intl. Symp. High-Performance Computer Architecture, pp. 29-40, 2002.
6 X. Jin, Xin, S. Goto, "Hilbert Transform-Based Workload Prediction and Dynamic Frequency Scaling for Power-Efficient Video Encoding," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 31, No. 5, pp. 649-661, 2012.   DOI
7 Y. Xie and G. H. Loh, "Scalable Shared-Cache Management by Containing Thrashing Workloads," Proc. of the 5th Intl. Conf. High Performance Embedded Architectures and Compilers, pp. 262-276, 2010.
8 X. Ding, K. Wang, X. Zhang, "ULCC: A User-Level Facility for Optimizing Shared Cache Performance on Multicores," ACM SIGPLAN Notices, Vol. 46, No. 8, pp. 103-112, 2011.   DOI
9 D. Burger and T. M. Austin, "The SimpleScalar tool set, version 2.0," ACM SIGARCH Computer Architecture News, Vol. 25, No. 3, pp. 13-25, 1997.   DOI
10 M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, R. B. Brown, "MiBench: A free, commercially representative embedded benchmark suite," Proc. of the IEEE Intl. Work. Workload Characterization, pp. 3-14, 2001.
11 Hynix Semiconductor, 240pin DDR2 SDRAM Unbuffered DIMMs based on 2Gb A version Rev.0.1, 2009.