DOI QR코드

DOI QR Code

Dynamic Directory Table: On-Demand Allocation of Directory Entries for Active Shared Cache Blocks

동적 디렉터리 테이블 : 공유 캐시 블록의 디렉터리 엔트리 동적 할당

  • 배한준 (고려대학교 전기전자공학과) ;
  • 최린 (고려대학교 전기전자공학과)
  • Received : 2017.08.14
  • Accepted : 2017.10.16
  • Published : 2017.12.15

Abstract

In this study we present a novel directory architecture that can dynamically allocate a directory entry for a cache block on demand at runtime only when the block is shared by more than one core. Thus, we do not maintain coherence for private blocks, substantially reducing the number of directory entries. Even for shared blocks, we allocate directory entry dynamically only when the block is actively shared, further reducing the number of directory entries at runtime. For this, we propose a new directory architecture called dynamic directory table (DDT), which is implemented as a cache of active directory entries. Through our detailed simulation on PARSEC benchmarks, we show that DDT can outperform the expensive full-map directory by a slight margin with only 17.84% of directory area across a variety of different workloads. This is achieved by its faster access and high hit rates in the small directory. In addition, we demonstrate that even smaller DDTs can give comparable or higher performance compared to recent directory optimization schemes such as SPACE and DGD with considerably less area.

디렉터리 기반의 캐시 일관성 유지 프로토콜을 사용하는 멀티 코어 시스템은 성능 향상을 위해 더 많은 코어를 집적하려 하지만 캐시 일관성 유지를 위한 오버헤드가 커져 코어 수를 늘리는 데에 제한이 생긴다. 기존의 연구들은 주로 디렉터리 엔트리의 크기를 줄이는 데에 집중하고 있다. 이 논문에서는 캐시 블록이 두 개 이상의 코어에 의해 공유될 때에 디렉터리 엔트리를 동적으로 할당하는 디렉터리 구조를 제안한다. 이에 따라 하나의 코어에 의해서만 접근되는 블록들에 대해 디렉터리 정보를 관리하지 않음으로써 디렉터리 엔트리의 수를 줄일 수 있다. 우리는 PARSEC 벤치마크에서의 시뮬레이션을 통해 풀맵에 비해 훨씬 적은 수의 디렉터리 엔트리에서 높은 DDT hit rate을 가져 shared cache의 디렉터리 정보를 충분히 관리할 수 있음을 확인함과 동시에 풀맵과 비슷한 성능으로 디렉터리의 크기를 풀맵 대비 17.84%까지 줄일 수 있음을 확인했다.

Keywords

References

  1. Alisafaee, M., "Spatiotemporal Coherence Tracking," Proc. of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 341-350, Dec. 2012.
  2. Agarwal, A., Simoni, R., Hennessy, J., and Horowitz, M., “An Evaluation of Directory Schemes for Cache Coherence,” ACM SIGARCH Computer Architecture News, Vol. 16, No. 2, pp. 280-298, Jun. 1988. https://doi.org/10.1145/633625.52432
  3. Censier, L. M., and Feautrier, P., “A New Solution to Coherence Problems in Multicache+Systems,” IEEE Transactions on Computers, Vol. 100, No. 12, pp. 1112-1118, 1978.
  4. Zhao, H., Shriraman, A., and Dwarkadas, S., "SPACE: Sharing Pattern-Based Directory Coherence for Multicore Scalability," Proc. of the 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 135-146, Sep.
  5. Zebchuk, J., Falsafi, B., and Moshovos, A., "Multigrain Coherence Directories," Proc. of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 359-370, Dec. 2013.
  6. Zhao, H., Shriraman, A., Kumar, S., and Dwarkadas, S., “Protozoa: Adaptive Granularity Cache Coherence,” ACM SIGARCH Computer Architecture News, Vol. 41, No. 3, pp. 547-558, Jun. 2013. https://doi.org/10.1145/2508148.2485969
  7. Cuesta, B. A., Ros, A., Gómez, M. E., Robles, A., and Duato, J. F., “Increasing the Effectiveness of Directory Caches by Deactivating Coherence for Private Memory Blocks,” ACM SIGARCH Computer Architecture News, Vol. 39, No. 3, pp. 93-104, Jun. 2011. https://doi.org/10.1145/2024723.2000076
  8. Papamarcos, M. S., and Patel, J. H., “A Low-overhead Coherence Solution for Multiprocessors with Private Cache Memories,” ACM SIGARCH Computer Architecture News, Vol. 12, No. 3, pp. 348-354, Jan. 1984. https://doi.org/10.1145/773453.808204
  9. Rudolph, L., and Segall, Z., “Dynamic Decentralized Cache Schemes for MIMD Parallel Processors,” ACM SIGARCH Computer Architecture News, Vol. 12, No. 3, pp. 340-347, 1984. https://doi.org/10.1145/773453.808203
  10. Binkert, N., Beckmann, B., Black, G., Reinhardt, S. K., Saidi, A., Basu, A., Hestness, J., Hower, D. R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M. D. and Wood, D. A., “The Gem5 Simulator,” ACM SIGARCH Computer Architecture News, Vol. 39, No. 2, pp. 1-7, 2011.
  11. Bienia, C., Kumar, S., Singh, J. P., and Li, K., "The PARSEC Benchmark Suite: Characterization and Architectural Implications," Proc. of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 72-81 Oct. 2008.
  12. Thoziyoor, S., Muralimanohar, N., Ahn, J. H., and Jouppi, N., "Cacti 5.3.," HP Laboratories, Palo Alto, CA., 2008.