Browse > Article

Application Behavior-oriented Adaptive Remote Access Cache in Ring based NUMA System  

곽종욱 (서울대학교 전기컴퓨터공학부)
장성태 (수원대학교 정보공학대학 컴퓨터학과)
전주식 (서울대학교 공과대학 전기컴퓨터공학부)
Abstract
Due to the implementation ease and alleviation of memory bottleneck effect, NUMA architecture has dominated in the multiprocessor systems for the past several years. However, because the NUMA system distributes memory in each node, frequent remote memory access is a key factor of performance degradation. Therefore, efficient design of RAC(Remote Access Cache) in NUMA system is critical for performance improvement. In this paper, we suggest Multi-Grain RAC which can adaptively control the RAC line size, with respect to each application behavior Then we simulate NUMA system with multi-grain RAC using MINT, event-driven memory hierarchy simulator. and analyze the performance results. At first, with profile-based determination method, we verify the optimal RAC line size for each application and, then, we compare and analyze the performance differences among NUMA systems with normal RAC, with optimal line size RAC, and with multi-grain RAC. The simulation shows that the worst case can be always avoided and results are very close to optimal case with any combination of application and RAC format.
Keywords
NUMA System; Remote Access Cache; Multi-Grain RAC; Application Behavior;
Citations & Related Records
연도 인용수 순위
  • Reference
1 L. Ifode, J. P. Singh, K. Li, 'Understanding Application Performance on Shared Virtual Memory Systems,' Proceedings of the 23rd annual International symposium on Computer architecture pp. 122-133, 1996   DOI
2 D.J. Lilja. 'Cache coherence in large-scale shared-memory multiprocessors : Issues and comparisons,' ACM Computing Surveys, 25(3):303-338, Sept. 1993   DOI   ScienceOn
3 J. L. Hennessy and D.A Patterson, 'Computer Architecture : A Quantitative Approach,' Second Edition, Morgan Kaufmann Publishers, Inc, 1996
4 Kai Hwang and Zhiwei Xu, 'Scalable parallel Computing: Technology, Architecture, Programming,' McGraw-Hill, 1998
5 Daniel Lenoski, Anoop Gupta et. 'The Stanford Dash Multiprocessor,' IEEE Computer, March 1992   DOI   ScienceOn
6 Zhang, Z. and J. Torrelas, 'Reducing Remote Conflict Misses : NUMA with Remote Cacae versus COMA,' In Proc. of the 3rd IEEE Symp. on High performance Computer Architecture (HPCA-3), pp. 272-281, Feb. 1997
7 Per Stenstrom, Truman Joe, and Anoop Gupta, 'Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures,' In the 19th Int'l Symp. on Computer Architecture, pages 80-91, 1992   DOI
8 C. Dubnicki, Thomas J. LeBlanc, 'Adjustable Block Size Coherent Cache,' International Symposium on Computer Architecture pp. 170-180, 1992   DOI
9 A. V. Veidenbaum, W. Tang, R. Gupta, A. Nicolau, X. Ji, 'Adapting Cache Line Size to Application Behavior,' International Conference on Supercomputing, June 1999   DOI
10 김형호, '지점간 링크를 이용한 스누핑 버스의 설계 및 성능 분석', 서울대학교 석사학위 논문, 1996
11 S. J. Eggers and R. H. Katz, 'The Effect of Sharing on the Cache and Bus Performance of Parallel Programs,' In Proc. 3rd ASPLOS, 1989   DOI
12 S. Dwarkadas 외 6인, 'Comparative Evaluation of Fine- and Coarse-Grain Approaches for Software Distributed Shared Memory,' HPCA, 1998
13 D. R. Cheriton, A.Gupta, P. D. Boyle, and H. A. Goosen, 'The VMP Multiprocessor : Initial Experience, Refinements, and Performance Evaluation,' 15th ISCA, 1988   DOI
14 IEEE Computer Society, 'IEEE Standard for Scalable Coherent Interface(SCI),' Institute of Electrical and Electronics Engineers, August 1993
15 K. Inoue, K. Kai, K. Murakami, 'High Bandwidth, Variable Line-Size Cache Architecture for Merged DRAM/Logic LSIs,' IEICE TRANS. ELECTRON, 1998
16 Tom Shanley, 'Pentium Pro Processor System Architecture,' Mind Share, Inc., 1997
17 JACK E. Veenstra, Robert J. Fowler, 'MINT : A front end for efficient simulation of shared-memory multiprocessors,' In Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems(MASCOTS), pp. 201-207, 1994   DOI
18 S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and Anoop Gupta, 'Methodological Considerations and Characterization of the Splash-2 Parallel Application Suite,' In Proceedings of the 22th International Symposium on Computer Architecture, pp. 24-36, May 1995
19 David E. Culler, Jaswinder P. Singh, A. Gupta, 'Parallel Computer Architecture : A Hardware / Software Approach,' Morgan Kaufmann Publishers, Inc, 1999