DOI QR코드

DOI QR Code

Improving Instruction Cache Performance by Dynamic Management of Cache-Image

캐시 이미지의 동적 관리 방법을 이용한 명령어 캐시 성능 개선

  • 서효중 (가톨릭대학교 컴퓨터정보공학부)
  • Received : 2017.04.25
  • Accepted : 2017.07.17
  • Published : 2017.09.15

Abstract

The burst loading of a pre-created cache-image is an effective method to reduce the instruction cache misses in the early stage of the program execution. It is useful to alleviate the performance degradation as well as the energy inefficiency, which is induced by the concentrated cold misses at the instruction cache. However, there are some defects, including software overhead on the compiler and installer. Furthermore, there are several mismatches as a result of the dynamic properties for specific applications. This paper addresses these issues and proposes a cache-image maintenance/recreation policy that can conduct dynamic management using a hardware-assisted method. The results of the simulation show that the proposed method can maintain the cache-image with a proper size and validity.

프로세스 구동시마다 캐시 이미지를 메모리로부터 버스트 로딩하여 초기 캐시 실패를 줄이는 방법은 프로그램의 시작으로부터 초기화 부분의 지연을 줄이고 에너지 소모를 줄이는 데 효과적이다. 하지만 로딩에 사용하기 위한 적절한 캐시 이미지는 컴파일러와 인스톨러 등 소프트웨어적인 접근 방법을 이용하여 적절한 캐시 이미지를 생성하는 과정이 필요하며, 동적인 수행 특성을 보이는 프로세스의 경우 비효율적이다. 본 논문은 이러한 손실에 주목하여, 하드웨어를 부가하여 캐시 이미지를 동적으로 생성하고 관리하는 방법을 제안하고자 한다. 시뮬레이션 결과에 따르면 제안한 방법을 사용할 경우 프로그램의 캐시 필요량에 따른 적절한 이미지 크기를 유지할 수 있어 기존의 캐시 이미지 로딩 기법을 더욱 효율적으로 개선할 수 있었다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. D. Grunwald, C. B. Morrey, III, P. Levis, M. Neufeld, and K. I. Farkas, "Policies for dynamic clock scheduling," Proc. of the 4th Conf. Symp. Operating System Design & Implementation, Vol. 4, No. 6, 2000.
  2. J. Pouwelse, K. Langendoen, H. Sips, "Dynamic voltage scaling on a low-power microprocessor," Proc. of the Intl. Conf. Mobile computing and Networking, pp. 251-259, 2001.
  3. H.J. Suh, T. Kim, "Burst Loading Method of Instruction Cache Image for Program Latency Reduction and Energy Saving," Journal of KIISE : Computing Practices and Letters, Vol. 19, No. 4, pp. 163-170, 2013. (in Korean)
  4. S.Y. Hwang, H.J. Suh, "Program Latency Reduction and Energy Saving by Way-Selective Cache Image Pre-Loading of Instruction Cache," Journal of KIISE : Computing Practices and Letters, Vol. 20, No. 3, pp. 121-130, 2014. (in Korean)
  5. G. Semeraro, G. Magklis, R. Balasubramonian, D.H. Albonesi, S. Dwarkadas, M. L. Scott, "Energy- efficient processor design using multiple clock domains with dynamic voltage and frequency scaling," Proc. of the 8th Intl. Symp. High-Performance Computer Architecture, pp. 29-40, 2002.
  6. X. Jin, Xin, S. Goto, "Hilbert Transform-Based Workload Prediction and Dynamic Frequency Scaling for Power-Efficient Video Encoding," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 31, No. 5, pp. 649-661, 2012. https://doi.org/10.1109/TCAD.2011.2180383
  7. Y. Xie and G. H. Loh, "Scalable Shared-Cache Management by Containing Thrashing Workloads," Proc. of the 5th Intl. Conf. High Performance Embedded Architectures and Compilers, pp. 262-276, 2010.
  8. X. Ding, K. Wang, X. Zhang, "ULCC: A User-Level Facility for Optimizing Shared Cache Performance on Multicores," ACM SIGPLAN Notices, Vol. 46, No. 8, pp. 103-112, 2011. https://doi.org/10.1145/2038037.1941568
  9. D. Burger and T. M. Austin, "The SimpleScalar tool set, version 2.0," ACM SIGARCH Computer Architecture News, Vol. 25, No. 3, pp. 13-25, 1997. https://doi.org/10.1145/268806.268810
  10. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, R. B. Brown, "MiBench: A free, commercially representative embedded benchmark suite," Proc. of the IEEE Intl. Work. Workload Characterization, pp. 3-14, 2001.
  11. Hynix Semiconductor, 240pin DDR2 SDRAM Unbuffered DIMMs based on 2Gb A version Rev.0.1, 2009.