Browse > Article
http://dx.doi.org/10.9708/jksci.2013.18.11.001

Core-aware Cache Replacement Policy for Reconfigurable Last Level Cache  

Son, Dong-Oh (School of Electronics and Computer Engineering, Chonnam National University)
Choi, Hong-Jun (School of Electronics and Computer Engineering, Chonnam National University)
Kim, Jong-Myon (School of Computer Engineering and Information Technology, University of Ulsan)
Kim, Cheol-Hong (School of Electronics and Computer Engineering, Chonnam National University)
Abstract
In multi-core processors, Last Level Cache(LLC) can reduce the speed gap between the memory and the core. For this reason, LLC has big impact on the performance of processors. LLC is composed of shared cache and private cache. In computer architecture community, most researchers have mainly focused on the management techniques for shared cache, while management techniques for private cache have not been widely researched. In conventional private LLC, memory is statically assigned to each core, resulting in serious performance degradation when the workloads are not fairly distributed. To overcome this problem, this paper proposes the replacement policy for managing private cache of LLC efficiently. As proposed core-aware cache replacement policy can reconfigure LLC dynamically, hit rate of LLC is increases drastically. Moreover, proposed policy uses 2-bit saturating counters to improve the performance. According to our simulation results, the proposed method can improve hit rates by 9.23% and reduce the access time by 12.85% compared to the conventional method.
Keywords
Multi-core processor; Cache Partitioning; Last Level Cache(LLC); Cache replacement;
Citations & Related Records
연도 인용수 순위
  • Reference
1 V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger, "Clock rate versus IPC: the end of the road for conventional microArchitectures," In Proceedings of 27th international symposium on computer architecture, pp. 248-259, Vancouver, Canada, June. 2000.
2 Y. J. Kwon, C. D. Kim, S. R. Maeng, and J. H. Huh, "Virtualizing performance asymmetric multi-core systems," In Proceedings of 38th International Symposium on Computer Architecture, pp. 45-56, San Jose, USA, June. 2011.
3 M. DeVuyst, A. Venkat, and D. M. Tullsen, "Execution migration in a heterogeneous-ISA chip multiprocessor," In Proceedings of 17th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 261-272, London, UK, Mar. 2012.
4 M. K. Qureshi, D. Thompson, and Y. N. Patt, "The V-Way Cache : Demand-Based Associativity via Global Replacement," In Proceedings of The 32nd International Symposium on Computer Architecture, pp. 544-555, Madison, USA, June. 2005.
5 H. Dybdahl, and P. Stenstrom, "An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors," In Proceedings of 13th International Symposium on High Performance Computer Architecture, pp. 2-12, Phoenix, USA, Feb. 2007.
6 A. Jaleel, M. Mattina, and B. Jacob, "Last Level Cache (LLC) Performance of Data Mining Workloads On a CMP -A Case Study of Parallel Bioinformatics Workloads," In Proceedings of 12th International Symposium on High Performance Computer Architecture, pp. 88-98, Austin, USA, Feb. 2006.
7 J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel, "Cohesion: An Adaptive Hybrid Memory Model for Accelerators," IEEE MICRO, Vol. 31, Issue 1, pp. 42-55, Jan.-Feb. 2011.
8 A. Meixner, and D. J. Sorin, "Error Detection via Online Checking of Cache Coherence with Token Coherence Signatures," In Proceedings of 13th International Symposium on High Performance Computer Architecture, pp. 145-156, Phoenix, USA, Feb. 2007.
9 L. Cheng, J. B. Carter, and D. Dai, "An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing," In Proceedings of 13th International Symposium on High Performance Computer Architecture, pp. 328-339, Phoenix, USA, Feb. 2007.
10 M. Chaudhuri, "Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches," In Proceedings of 42nd Microarchitecture, pp. 401-412, New York, USA. Dec. 2009.
11 M. K. Qureshi, and Y. N. Patt, "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches," In Proceedings of 39th Microarchitecture, pp. 423-432, Orlando, USA, Dec. 2006.
12 A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely. Jr, and J. Emer, "Adaptive insertion policies for managingshared caches," In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pp. 208-219, Toronto, Canada, Oct. 2008.
13 A. Jaleel, K. B. Theobald, S. C. Steely. Jr, and J. Emer, "Highperformance cache replacement using re-reference intervalprediction (RRIP)," In Proceedings of 32nd International Symposium on Computer Architecture, pp. 60-71, Madison, USA, June. 2010.
14 S. Kim, D. Chandra, and D. Solihin, "Fair cache sharing and partitioning in a chip multiprocessor architecture," In Proceedings of the 13th international conference on Parallel architectures and compilation techniques, pp. 111-122, Antibes Juan-les-Pins, France, Sep. 2004.
15 S. Srikantaiah, M. Kandemir, and Q. Wang, "Sharp control:Controlled shared cache management in chip multiprocessors," In Proceedings of 42nd Microarchitecture, pp. 517-528, New York, USA, dec. 2009.
16 Y. Xie and G. H. Loh, "PIPP: promotion/insertion pseudopartitioning of multi-core shared caches," In Proceedings of The 36th International Symposium on Computer Architecture, pp. 174-183, Austin, USA, June. 2009.
17 Y. Xie and G. H. Loh, "Scalable shared-cache management by containing thrashing workloads," In Proceedings of High Performance Embedded Architectures and Compilers, pp. 262-276. Pisa, Italy, Jan, 2010.
18 J. M. Kim and S. W. Chung, "Group-Based Replacement Algorithm to Reduce Cache Miss in Last Level Cache," Journal of The Korea Society of Computer and Information, Vol. 6, No. 5, pp.44-50, Oct. 2010.
19 E. Perelman, M. Polito, J. B, J. Sampson, B. Calder, and C. Dulong, "Detecting Phases in Parallel Applications on Shared Memory Architectures," In Proceedings of International Parallel and Distributed Processing Symposium, pp. 88-88, Rhodes Island, Greece, April. 2006.
20 J. Lee, and H. Kim, "TAP: A TLP-Aware Cache Management Policy for a CPU-GPU Heterogeneous Architecture," In Proceedings of 18th International Symposium on High Performance Computer Architecture, pp. 91-102, New Orleans, USA, Feb. 2012.
21 Z. Zhang, Z. Zhu, and X. Zhang, "Design and Optimization of Large Size and Low Overhead Off-Chip Caches," IEEE Transactions on Computer, Vol. 53, Issue 7, pp. 843-855, July. 2004.   DOI   ScienceOn
22 Y. Yang, P. Xiang, M. Mantor, and H. Zhou, "CPU-Assisted GPGPU on Fused CPU-GPU Architectures," In Proceedings of 18th International Symposium on High Performance Computer Architecture, pp. 1-12, New Orleans, USA, Feb. 2012.