Reduction of Read Access Latency by Invalid Hint in Directory-Based Cache Coherence Scheme

디렉토리를 이용한 캐쉬 일관성 유지 기법에서 무효화 힌트를 이용한 읽기 접근 시간 감소

  • 오승택 (한국과학기술원 전기전산학과) ;
  • 이윤석 (한국외국어대학교 전자제어공학부) ;
  • 맹승렬 (한국과학기술원 전기전산학과) ;
  • 이준원 (한국과학기술원 전기전산학과)
  • Published : 2000.04.15

Abstract

Large scale shared memory multiprocessors have suffered from large access latency to shared memory. The large latency partially stems from a feature of directory-based cache coherence schemes which require a shared memory access to be serviced at a home node of the memory block. The home visit results in three or more hops traversal for a memory read access. The traversal becomes much longer as a system scales up. In this paper, we propose a new cache coherence scheme that reduces read access latency. The proposed scheme exploits ideas of invalid hint. Invalid hint for a cache block means which node has invalidated the cache block before. Thus a read access request can be directly sent to and serviced by the node (called owner) without help of a home node. Execution-driven simulation is employed to evaluate performance of the proposed scheme. The simulation results show that read access latency and execution time are reduced.

대규모 분산 공유메모리 다중처리기는 공유메모리 접근 지연시간이 크다는 약점을 지니고 있다. 이러한 다중처리기에서 모든 메모리 요청이 홈노드를 통해 이루어지는 디렉토리 기반의 캐쉬 일관성 유지 기법의 사용은 메모리 접근 지연시간을 더욱 크게 하는 요인으로 작용한다. 뿐만 아니라 메모리 접근 지연시간은 시스템의 규모가 커질수록 전체 성능에 중요한 요소로 작용하므로, 대규모 시스템에서 이를 줄이기 위해서 많은 연구들이 있었다. 본 논문에서는 메모리 읽기 지연시간을 줄이는 새로운 캐쉬 일관성 유지 기법을 제안한다. 제안된 기법은 무효화힌트를 이용하여 구현되었다. 무효화힌트는 어떤 노드가 전에 캐쉬블록을 무효화 시켰는가에 관한 정보이며, 메모리블록이 필요한 노드는 이 정보를 이용하여 홈노드의 도움 없이 직접 메모리 요청을 할 수 있다. 제안된 프로토콜의 성능을 측정하기 위하여 모의실험을 하였다. 모의실험 결과는 제안된 프로토콜에서 읽기 지연시간이 감소하는 것을 나타낸다.

Keywords

References

  1. A.Smith, 'Cache Memories,' ACM Computing Surveys, pp. 473-530, 1982 https://doi.org/10.1145/356887.356892
  2. L.M.Censier and P.Feautrier, 'A New Solution to Cache Problems in Multicache Systems,' IEEE Transactions on Computers, pp. 1112-1118, Dec. 1978 https://doi.org/10.1109/TC.1978.1675013
  3. R.H.Katz, S.J.Eggers, D.A.Wood, C.L.Perkins, and R.G.Sheldon, 'Implementating a Cache Coherence Protocols,' in Proceedings of the International Symposium on Computer Architecture, pp. 276-283, Jun. 1985 https://doi.org/10.1145/327010.327237
  4. H.Nilsson and P.Stenstorm, 'An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic,' in Proceedings of Parallel Architectures and Languages Europe, Jun. 1994 https://doi.org/10.1007/3-540-58184-7_115
  5. D.D.Corso, M.Kirrman, and J.Nicoud, Microcomputer Buses and Links. Academic Press, 1986
  6. D.E.Lenoski, J.P.Laudon, K.Gharachorllo, A Gupta, and J.L.Hennessy, 'The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor,' in Proceedings of the 17th Annual International Symposium on Computer Architecture, pp. 148-159, 1990 https://doi.org/10.1145/325164.325132
  7. A.Agarwal, B.H.Lim, D.Kranz, and J.Kubiatowicz, 'APRIL: A Processor Architecture for Multiprocessing,' in Proceedings of the 17th Annual International Symposium on Computer Architecture, pp. 104-114, May. 1990 https://doi.org/10.1145/325164.325119
  8. M.Wolf and M.Lam, 'A Data Locality Optimizing Algorithm,' in Proceedings of the ACM SIGPLAN '91 Conference on Proceeding Language Design and Implementation, pp. 30-44, Jun. 1991 https://doi.org/10.1145/113445.113449
  9. K.Li and P.Hudak, 'memory Coherence in Shared Virtual Memory Systems,' ACM Transactions on Computer Systems, pp. 321-359, Nov. 1989 https://doi.org/10.1145/75104.75105
  10. A.Gupta, T.Joe, and P.Stenstrom, 'Performance Limitations of Cache-Coherence NUMA and Hierarchical COMA Multiprocessors and the Flat-COMA Solution,' Tech. Rep. CSL-TR-92-524, Stanford University, 1992
  11. J.Kuskin, D.Ofelt, M.Heinrich, J.Heinrich, R.Simoni, K.Gharachorloo, J.Chapin, D.Nakahira, J.Baxter, M.Horowitz, A.Gupta, M.Rosemblum, and J.Hennessy, 'The Stanford FLASH Multiprocessor,' in Proceedings of the 21th International Symposium on Computer Architecture, pp. 302-313, Apr. 1994 https://doi.org/10.1145/191995.192056
  12. X.Lin, P.K.Mckinley, and L.M.Li, 'Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multiprocessor,' IEEE Trans. on Parallel and Distributed Systems, pp. 793-804, Aug. 1994 https://doi.org/10.1109/71.298203
  13. T.E.Anderson, 'The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors,' IEEE Trans. on Parallel and Distributed Systems, pp. 6-16, Jan. 1990 https://doi.org/10.1109/71.80120
  14. J.E.Veensta and R.J.Fowler tech. rep., Rochester University, Jun. 1993
  15. S.Cameron, M.Ohara, E.Torrie, J.P.Singh, and A.Gupta, 'The SPLASH-2 Programs: Characterization and methodological Considerations,' in Proceedings of the 22th Annual International Symposium on Computer Architecture, pp. 24-36, Jun. 1995 https://doi.org/10.1145/223982.223990