• Title/Summary/Keyword: NUMA System

Search Result 35, Processing Time 0.026 seconds

Design and Implementation of an SCI-Based Network Cache Coherent NUMA System for High-Performance PC Clustering (고성능 PC 클러스터 링을 위한 SCI 기반 Network Cache Coherent NUMA 시스템의 설계 및 구현)

  • Oh Soo-Cheol;Chung Sang-Hwa
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.12
    • /
    • pp.716-725
    • /
    • 2004
  • It is extremely important to minimize network access time in constructing a high-performance PC cluster system. For PC cluster systems, it is possible to reduce network access time by maintaining network cache in each cluster node. This paper presents a Network Cache Coherent NUMA (NCC-NUMA) system to utilize network cache by locating shared memory on the PCI bus, and the NCC-NUMA card which is core module of the NCC-NUMA system is developed. The NCC-NUMA card is directly plugged into the PCI slot of each node, and contains shared memory, network cache, shared memory control module and network control module. The network cache is maintained for the shared memory on the PCI bus of cluster nodes. The coherency mechanism between the network cache and the shared memory is based on the IEEE SCI standard. According to the SPLASH-2 benchmark experiments, the NCC-NUMA system showed improvements of 56% compared with an SCI-based cluster without network cache.

Concurrent Hash Table Optimized for NUMA System (NUMA 시스템에 최적화된 병렬 해시 테이블)

  • Choi, JaeYong;Jung, NaiHoon
    • Journal of Korea Game Society
    • /
    • v.20 no.5
    • /
    • pp.89-98
    • /
    • 2020
  • In MMO game servers, NUMA (Non-Uniform Memory Access) architecture is generally used to achieve high performance. Furthermore, such servers normally use hash tables as internal data structure which have constant time complexity for insert, delete, and search operations. In this study, we proposed a concurrent hash table optimized for NUMA system to make MMO game servers improve their performance. We tested our hash table on 4 socket NUMA system, and the hash table shows at most 100% speedup over another high-performance hash table.

Performance Analysis of PC Cluster-based CC-NUMA System using Execution-driven Simulation (실행주도 시뮬레이션에 의한 PC 클러스터 기반 CC-NUMA 시스템 성능분석)

  • Ha, Chi-Jeong;Jeong, Sang-Hwa;O, Su-Cheol
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.4
    • /
    • pp.188-195
    • /
    • 2001
  • 본 논문에서는 PC 클러스터 기반 CC-NUMA 시스템을 제안하고, 시뮬레이션을 통하여 성능을 분석하였다. PC 클러스터 기반 CC-NUMA 시스템은 PC의 PCI slot에 CC-NUMA 카드를 장착함으로써 구현되며 공유메모리, 네트워크 캐쉬, 네트워크 제어 모듈을 포함한다. CC-NUMA 시스템은 PCI 버스상에 존재하는 메모리를 공유대상으로 하며, 공유메모리와 네트워크 캐쉬사이의 일관성은 IEEE SCI 표준에 의해 유지된다. CC-NUMA 시스템을 시뮬레이션 하기 위해 실행주도 시뮬레이터인 Limes를 수정하여 사용하였으며, 캐쉬 일관성 유지 알고리즘으로 SCI의 typical set을 구현하였다. 또한 기존 시스템과의 비교를 위해서 네트워크 캐쉬를 활용하지 않는 Dolphin사의 PCI-SCI 카드에 기반한 NUMA 시스템을 시뮬레이션 하였다. CC-NUMA 시스템의 성능을 측정하기 위하여 다양한 실험을 수행하였으며, 실험결과 CC-NUMA 시스템이 NUMA 시스템에 비해서 성능향상이 우수함을 알 수 있었다. 또한, CC-NUMA 시스템이 최적의 성능을 발휘하는 파라미터의 값을 도출하였으며, 이를 CC-NUMA 시스템의 실제 구현에 반영하였다.

  • PDF

Application Behavior-oriented Adaptive Remote Access Cache in Ring based NUMA System (링 구조 NUMA 시스템에서 적응형 다중 그레인 원격 캐쉬 설계)

  • 곽종욱;장성태;전주식
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.9
    • /
    • pp.461-476
    • /
    • 2003
  • Due to the implementation ease and alleviation of memory bottleneck effect, NUMA architecture has dominated in the multiprocessor systems for the past several years. However, because the NUMA system distributes memory in each node, frequent remote memory access is a key factor of performance degradation. Therefore, efficient design of RAC(Remote Access Cache) in NUMA system is critical for performance improvement. In this paper, we suggest Multi-Grain RAC which can adaptively control the RAC line size, with respect to each application behavior Then we simulate NUMA system with multi-grain RAC using MINT, event-driven memory hierarchy simulator. and analyze the performance results. At first, with profile-based determination method, we verify the optimal RAC line size for each application and, then, we compare and analyze the performance differences among NUMA systems with normal RAC, with optimal line size RAC, and with multi-grain RAC. The simulation shows that the worst case can be always avoided and results are very close to optimal case with any combination of application and RAC format.

CC-NUMA 시스템을 위한 진단 소프트웨어 개발

  • Jeong, Tae-Il;Jeong, Nak-Ju;Kim, Ju-Man;Kim, Hae-Jin
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.6 no.1
    • /
    • pp.82-92
    • /
    • 2000
  • This paper introduces an implementation of the diagnosis software for CC-NUMA systems. The CC-NUMA architecture is composed of two or more SMP nodes installed with the specialized hardware to provide cache-coherent operation and the high-speed interconnection network to connect each node, it enables both the high performance and the high scalability. While the CC-NUMA system provides the single system image in the operating system aspect, it should be considered the multiple systems by the diagnostic software. Thus it is difficult to diagnose and manage CC-NUMA system using commercial administration software due to characteristics of the complicated architecture. The remote diagnosis and management are also required with a view to reduce Total Cost of Ownership. In this paper, we design diagnostic software to manage CC-NUMA server system, and propose its mechanism in client-server manner to support remote administration. Additionally, we use the Java-based user interface to enlarge an administrator's accessibility.

  • PDF

Design and Performance of a CC-NUMA Prototype Card for SCI-Based PC Clustering (SCI 기반 PC 클러스터링을 위한 CC-NUMA 프로토타입 카드의 설계와 성능)

  • Oh, Soo-Cheol;Chung, Sang-Hwa
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.1
    • /
    • pp.35-41
    • /
    • 2002
  • It is extremely important to minimize network access time in constructing a high-performance PC cluster system For an SCI based PC cluster it is possilbe to reduce the network access time by maintaining network cache in each cluster node, This paper presents a CC-NUMA card that utilizes network cache for SCI based PC clustering The CC-NUMA card is directly plugged into the PCI solot of each node, and contains shared memory network cache, and interconnection modules. The network cache is maintained for the shared memory on the PCI bus of cluster nodes. The coherency mechanism between the network cache and the shared memory is based on the IEEE SCI standard. A CC-NUMA prototype card is developed to evaluate the performance of the system. According to the experiments. the cluster system with the CC-NUMA card showed considerable improvements compared with an SCI based clustser without network cache.

A Remote Cache Coherence Protocol for Single Shared Memory in Multiprocessor System (단일 공유 메모리를 가지는 다중 프로세서 시스템의 원격 캐시 일관성 유지 프로토콜)

  • Kim, Seong-Woon;Kim, Bo-Gwan
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.6
    • /
    • pp.19-28
    • /
    • 2005
  • The multiprocessor architecture is a good method to improve the computer system performance. The CC-NUMA provides a single shared space with the physically distributed memories is used widely in the multiprocessor computer system. A CC-NUMA has the full-mapped directory for the shared memory md uses a remote cache memory for tile fast memory access. In this paper, we propose a processing node architecture for a CC-NUMA system and a cache coherency protocol on the physically distributed but logically shared system. We show an implementation result of the system which is adopted the cache coherency protocol.

A dual-link CC-NUMA System Tolerant to the Multiprogramming Environment (다중 프로그램 환경에 적합한 이중 연결 CC-NUMA 시스템)

  • Suh, Hyo-Joong
    • The KIPS Transactions:PartA
    • /
    • v.11A no.3
    • /
    • pp.199-206
    • /
    • 2004
  • Under the multiprogrammed situation, the performance of multiprocessor system is affected by the process allocation policy of the operating systems. The lowest communication cost can be achieved when the related processes positioned to the adjacent processors. While the effective allocation is quite difficult to the real situation, and the processing of the allocation policy consumes some computation time. The dual-ring CC-NUMA systems exhibit a quite performance difference according to the process a1location policy due to a lot of unbalanced memory transactions on the interconnection networks. In this paper, I propose a load balanced dual-link CC-NUMA system that does not requires the processes allocation policy. By the program-driven simulation results. the proposed system shows no remarkable difference according to the allocation policy while the dual-ring systems shows 10% performance improvement by the process allocation. In addition, the proposed system outperforms the dual~ring systems about 1.5 times.

The Node Scheduling of Multi-Threaded Process for CC-NUMA System (CC-NUMA 시스템을 위한 다중 스레드 프로세스의 노드 스케줄링 설계 및 구현)

  • Kim, Jeong-Nyeo;Kim, Hae-Jin;Lee, Cheol-Hoon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.2
    • /
    • pp.488-496
    • /
    • 2000
  • this paper describes the design and implementation of node scheduling for MX Server that is CC-NUMA System COMSIX, the operating system of MX Server, is designed to suit for CC-NUMA Architecture. MX Server consists of up to 8 nodes, and each node is connected by SCI ring. This node scheduling scheme considers data locality for performance improvement of Oracle8i DBMS on the CC-NUMA architecture. For DBMS such as Oracle8i, a multi-threaded process may be run to tie on particular disk. We have developed a CG binding function that the multi-threaded process bound the node. Currently, We don't have an available CC-NUMA Platform. Instead of MX Server, we developed the Node scheduling scheme for multi-threaded process to suit server platform on the PC test-bed and tested completely.

  • PDF

Scalable CC-NUMA System using Repeater Node (리피터 노드를 이용한 Scalable CC-NUMA 시스템)

  • Kyoung, Jin-Mi;Jhang, Seong-Tae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.9
    • /
    • pp.503-513
    • /
    • 2002
  • Since CC-NUMA architecture has to access remote memory, the interconnection network determines the performance of the CC-NUMA system. Bus which has been used as a popular interconnection network has many limits in a large-scale system because of the limited physical scalability and bandwidth. The dual ring interconnection network, composed of high-speed point-to-point links, is made to resolve the defects of the bus for the large-scale system. However, it also has a problem, in that the response latency is rapidly increased when many nodes are attached to the snooping based CC-NUMA system with the dual ring. In this paper, we propose a ring architecture with repeater nodes in order to overcome the problem of the dual ring on a snooping based CC-NUMA system, and design a repeater node adapted to this architecture. We will also analyze the effects of proposed architecture on the system performance and the response latency by using a probability-driven simulator.