DOI QR코드

DOI QR Code

An Interference Matrix Based Approach to Bounding Worst-Case Inter-Thread Cache Interferences and WCET for Multi-Core Processors

  • Yan, Jun (Mathworks) ;
  • Zhang, Wei (Department of Electrical and Computer Engineering, Virginia Commonwealth University)
  • 투고 : 2011.02.18
  • 심사 : 2011.03.21
  • 발행 : 2011.06.30

초록

Different cores typically share the last-level cache in a multi-core processor. Threads running on different cores may interfere with each other. Therefore, the multi-core worst-case execution time (WCET) analyzer must be able to safely and accurately estimate the worst-case inter-thread cache interference. This is not supported by current WCET analysis techniques that manly focus on single thread analysis. This paper presents a novel approach to analyze the worst-case cache interference and bounding the WCET for threads running on multi-core processors with shared L2 instruction caches. We propose to use an interference matrix to model inter-thread interference, on which basis we can calculate the worst-case inter-thread cache interference. Our experiments indicate that the proposed approach can give a worst-case bound less than 1%, as in benchmark fib-call, and an average 16.4% overestimate for threads running on a dual-core processor with shared-L2 cache. Our approach dramatically improves the accuracy of WCET overestimatation by on average 20.0% compared to work.

키워드

참고문헌

  1. J. M. Calandrino, J. H. Anderson, and D. P. Baumberger, "A hybrid real-time scheduling approach for large-scale multicore platforms," Proceedings of the 19th Euromicro Conference on Real-Time Systems, Pisa, Italy, 2007, pp. 247-256. https://doi.org/10.1109/ECRTS.2007.81
  2. C. A. Healy, D. B. Whalley, and M. G. Harmon, "Integrating the timing analysis of pipelining and instruction caching," Proceedings of the 16th IEEE Real-Time Systems Symposium, Pisa, Italy, 1995, pp. 288-297. https://doi.org/10.1109/REAL.1995.495218
  3. F. Stappert, A. Ermedahl, and J. Engblom, "Efficient longest executable path search for programs with complex flows and pipeline effects," Proceedings of CASES 2001: International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, Atlanta, GA, 2001. https://doi.org/10.1145/502217.502240
  4. Y. T. S. Li and S. Malik, "Performance analysis of embedded software using implicit path enumeration," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 16, no. 12, pp. 1477-1487, Dec. 1997. https://doi.org/10.1109/43.664229
  5. Y. T. S. Li, S. Malik, and A. Wolfe, "Cache modeling for real-time software: beyond direct mapped instruction caches," Proceedings of the 17th IEEE Real-Time Systems Symposium, Washington, DC, 1996, pp. 254-263. https://doi.org/10.1109/REAL.1996.563722
  6. G. Ottosson and M. Sjodin, "Worst-case execution time analysis for modern hardware architectures," Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, Las Vegas, NV, 1997, pp. 47-55.
  7. Y. Jun and Z. Wei, "WCET analysis for multi-core processors with shared L2 instruction caches," IEEE Real-Time and Embedded Technology and Applications Symposium, St. Louis, MO, 2008, pp. 80-89. https://doi.org/10.1109/RTAS.2008.6
  8. Z. Wei and Y. Jun, "Accurately estimating worst-case execution time for multi-core processors with shared direct-mapped instruction caches," Proceedings of the 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, Beijing, China, 2009, pp. 455-463. https://doi.org/10.1109/RTCSA.2009.55
  9. L. Yan, V. Suhendra, L. Yun, T. Mitra, and A. Roychoudhury, "Timing analysis of concurrent programs running on shared cache multicores," Proceedings of the 30th IEEE Real-Time Systems Symposium, Washington, DC, 2009, pp. 57-67. https://doi.org/10.1109/RTSS.2009.32
  10. R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand, R. Heckman, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and P. Stenstrom, "The worst case execution time problem--overview of methods and survey of tools," ACM Transactions on Embedded Computing Systems, vol. 7, no. 3, pp. 1-52, Apr. 2008. https://doi.org/10.1145/1347375.1347389
  11. J. M. Calandrino, D. Baumberger, L. Tong, S. Hahn, and J. H. Anderson, "Soft real-time scheduling on performance asymmetric multicore platforms," Proceedings of the 13th IEEE Real Time and Embedded Technology and Applications Symposium, Bellevue, WA, 2007, pp. 101-112. https://doi.org/10.1109/RTAS.2007.35
  12. J. H. Anderson, J. M. Calandrino, and U. C. Devi, "Real-time scheduling on multicore platforms," Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium, San Jose, CA, 2006, pp. 179-190. https://doi.org/10.1109/RTAS.2006.35
  13. C. Liu, S. Anand, and M. Kandemir, "Organizing the last line of defense before hitting the memory wall for CMPs," Proceedings of the 10th International Symposium on High Performance Computer Architecture, Madrid, Spain, 2004, pp. 176-185. https://doi.org/10.1109/HPCA.2004.10017
  14. J. Chang and G. S. Sohi, "Cooperative caching for chip multiprocessors," ACM SIGARCH Computer Architecture News, vol. 34, no. 2, May. 2006.
  15. L. N. Chakrapani, J. Gyllenhaal, W. H. Wenmei, and S. A. Mahlke, Trimaran: an infrastructure for research in backend compilation and architecture exploration, http://www.trimaran.org.
  16. X. Li, Y. Liang, T. Mitra, and A. Roychoudhury, Chronos: a timing analyzer for embedded software, Science of Computer Programming, 2007.
  17. Malardalen Research and Technology Centre, WCET project/Benchmarks, http://www.mrtc.mdh.se/projects/wcet/benchmarks.html.
  18. L. Chunho, M. Potkonjak, and W. H. Mangione-Smith, "MediaBench: a tool for evaluating and synthesizing multimedia and communications systems," Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, Research Triangle Park, NC, USA, 1997, pp. 330-335. https://doi.org/10.1109/MICRO.1997.645830
  19. C. Ferdinand and R. Wilhelm, "Efficient and precise cache behavior prediction for real-time systems," Real-Time Systems, vol. 17, no. 2-3, pp. 131-181, Nov. 1999. https://doi.org/10.1023/A:1008186323068
  20. J. Yan and W. Zhang, "A time-predictable VLIW processor and its compiler support," Real-Time Systems, vol. 38, no. 1, pp. 67-84, Jan. 2008. https://doi.org/10.1007/s11241-007-9030-5
  21. C. Ferdinand and R. Wilhelm, "Efficient and precise cache behavior prediction for real-time systems," Real-Time Systems, vol. 17, no. 2-3, pp. 131-181, Nov. 1999. https://doi.org/10.1023/A:1008186323068