• Title/Summary/Keyword: non uniform memory access

Search Result 17, Processing Time 0.022 seconds

MBS-LVM: A High-Performance Logical Volume Manager for Memory Bus-Connected Storages over NUMA Servers

  • Lee, Yongseob;Park, Sungyong
    • Journal of Information Processing Systems
    • /
    • v.15 no.1
    • /
    • pp.151-158
    • /
    • 2019
  • With the recent advances of memory technologies, high-performance non-volatile memories such as non-volatile dual in-line memory module (NVDIMM) have begun to be used as an addition or an alternative to server-side storages. When these memory bus-connected storages (MBSs) are installed over non-uniform memory access (NUMA) servers, the distance between NUMA nodes and MBSs is one of the crucial factors that influence file processing performance, because the access latency of a NUMA system varies depending on its distance from the NUMA nodes. This paper presents the design and implementation of a high-performance logical volume manager for MBSs, called MBS-LVM, when multiple MBSs are scattered over a NUMA server. The MBS-LVM consolidates the address space of each MBS into a single global address space and dynamically utilizes storage spaces such that each thread can access an MBS with the lowest latency possible. We implemented the MBS-LVM in the Linux kernel and evaluated its performance by porting it over the tmpfs, a memory-based file system widely used in Linux. The results of the benchmarking show that the write performance of the tmpfs using MBS-LVM has been improved by up to twenty times against the original tmpfs over a NUMA server with four nodes.

Efficient Processing of Grouped Aggregation on Non-Uniformed Memory Access Architecture (비균등 메모리 접근 구조에서의 효율적인 그룹화 집단 연산의 처리)

  • Choe, Seongjun;Min, Jun-Ki
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.14-27
    • /
    • 2018
  • Recently, to alleviate the memory bottleneck problme occurred in Symmetric Multiprocessing (SMP) architecture, Non-Uniform Memory Access (NUMA) architecture was proposed. In addition, since an aggregation operator is an important operator providing properties and summary of data, the efficiency of the aggregation operator is crucial to overall performance of a system. Thus, in this paper, we propose an efficient aggregation processing technique on NUMA architecture. Our proposed technique consists of partition phase and merge phase. In the partition phase, the target relation is partitioned into several partial relations according to grouping attribute. Thus, since each thread can process aggregation operator on partial relation independently, we prevent the remote memory access during the merge phase. Furthermore, at the merge phase, we improve the performance of the aggregation processing by letting each thread compute aggregation with a local hash table as well as avoiding lock contention to merge aggregation results generated by all threads into one.

Concurrent Hash Table Optimized for NUMA System (NUMA 시스템에 최적화된 병렬 해시 테이블)

  • Choi, JaeYong;Jung, NaiHoon
    • Journal of Korea Game Society
    • /
    • v.20 no.5
    • /
    • pp.89-98
    • /
    • 2020
  • In MMO game servers, NUMA (Non-Uniform Memory Access) architecture is generally used to achieve high performance. Furthermore, such servers normally use hash tables as internal data structure which have constant time complexity for insert, delete, and search operations. In this study, we proposed a concurrent hash table optimized for NUMA system to make MMO game servers improve their performance. We tested our hash table on 4 socket NUMA system, and the hash table shows at most 100% speedup over another high-performance hash table.

Performance Comparison of Synchronization Methods for CC-NUMA Systems (CC-NUMA 시스템에서의 동기화 기법에 대한 성능 비교)

  • Moon, Eui-Sun;Jhang, Seong-Tae;Jhon, Chu-Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.4
    • /
    • pp.394-400
    • /
    • 2000
  • The main goal of synchronization is to guarantee exclusive access to shared data and critical sections, and then it makes parallel programs work correctly and reliably. Exclusive access restricts parallelism of parallel programs, therefor efficient synchronization is essential to achieve high performance in shared-memory parallel programs. Many techniques are devised for efficient synchronization, which utilize features of systems and applications. This paper shows the simulation results that existing synchronization methods have inefficiency under CC-NUMA(Cache Coherent Non-Uniform Memory Access) system, and then compares the performance of Freeze&Melt synchronization that can remove the inefficiency. The simulation results present that Test-and-Test&Set synchronization has inefficiency caused by broadcast operation and the pre-defined order of Queue-On-Lock-Bit (QOLB) synchronization to execute a critical section causes inefficiency. Freeze&Melt synchronization, which removes these inefficiencies, has performance gain by decreasing the waiting time to execute a critical section and the execution time of a critical section, and by reducing the traffic between clusters.

  • PDF

Cost-effective multistage interconnection network for UNMA model system (NUMA(non-uniform memory access) 모델 시스템을 위한 cost-effective한 다단계 상호연결망)

  • 최창훈;김성천
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.5
    • /
    • pp.19-32
    • /
    • 1997
  • So far, the multiple path MINs to provide redundant paths in the traditional UPP MINs have been realized by adding additional hardware such as extra stages, duplicated data links, or multiple copies of sthe MIN. And the traditional MINs do not exploit locality: communication with all processor-memory paris takes the same amount of time. Also so far there has been little progress for exploiting locality of reference in MINs. In this paper, we present a new topology MIN, hybrid MIN that is constructed with 2N-3 SEs which is far fewer SEs than that of traditional MINs. Although the hybrid MIN is constructed with 2N-3 SEs, the hybrid MIN satisfies full access capability (FAC) and has redundant paths(but providing single path for 2 memory modules of each processor). Moreover the has redundant paths (but providing single path for 2 memory modules of each processor). Moreover the Hybrid MIN provides shortcut path between pairs which have frequent dat acommunication (locality of reference). Its performance under varing degrees of localized communication is analyzed.

  • PDF

A Remote Cache Coherence Protocol for Single Shared Memory in Multiprocessor System (단일 공유 메모리를 가지는 다중 프로세서 시스템의 원격 캐시 일관성 유지 프로토콜)

  • Kim, Seong-Woon;Kim, Bo-Gwan
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.6
    • /
    • pp.19-28
    • /
    • 2005
  • The multiprocessor architecture is a good method to improve the computer system performance. The CC-NUMA provides a single shared space with the physically distributed memories is used widely in the multiprocessor computer system. A CC-NUMA has the full-mapped directory for the shared memory md uses a remote cache memory for tile fast memory access. In this paper, we propose a processing node architecture for a CC-NUMA system and a cache coherency protocol on the physically distributed but logically shared system. We show an implementation result of the system which is adopted the cache coherency protocol.

Page replication mechanism using adjustable DELAY counter in NUMA multiprocessors (NUMA 다중처리기에서 조정가능한 지연 카운터를 이용한 페이집 복사 기법)

  • 이종우;조유곤
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.6
    • /
    • pp.23-33
    • /
    • 1996
  • The exploitation of locality of reference in shared memory NUMA multiprocessors is one of the improtant problems in parallel processing today. In this paper, we propose a revised hardeare reference counter to help operating system to manage locality. In contrast to the previous one, the value of counter can abe adjusted dynamically and periodically to adapt the page replication policy to the various memory reference patterns of processors. We use execution-driven simulation of real applications to evaluate the effectiveness of our adjustable DELAY counter. Our main conclusijon is that by using the adjustable DELAY counter the t normalized average memory access costs and the variance of them become smaller for most applications than the previous one and more robust memory management policies can be provided for the operating systems.

  • PDF

A study of workload consolidation considering NUMA affinity (NUMA affinity를 고려한 Workload Consolidation 연구)

  • Seo, Dongyou;Kim, Shin-gye;Choi, Chanho;Eom, Hyeonsang;Yeom, Heon Y.
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.204-206
    • /
    • 2012
  • SMP(Symmetric Multi-Processing)는 Shared memory bus 를 사용함으로써 scalability 가 제한적이었다. 이런 SMP의 scalability 제한을 극복하기 위해 제안 된 것이 NUMA(Non Uniform Memory Access)이다. NUMA는 memory bus 를 CPU 별 local 하게 가지고 있어 자신이 가지는 memory 영역에 대해서는 다른 영역을 접근하는 것 보다 더 빠른 latency 를 가지는 구조이다. Local 한 memory 영역의 존재는 scalability를 높여 주었지만 서버 가상화 환경에서 VM을 동적으로 scheduling 을 하였을 때 VM의 page 가 실행되는 core 의 local 한 메모리 영역에 존재하지 않게 되면 remote access로 인해 local access보다 성능이 떨어진다. 이 논문에서는 서버 가상화 환경에서 최신 architecture인 AMD bulldozer에서 NUMA affinity가 위반되었을 때 발생하는 성능 저하와 어떤 상황에서 이런 NUMA affinity가 위반되어도 성능저하가 없는지 연구하였다.

Metastable Vortex State of Perpendicular Magnetic Anisotropy Free Layer in Spin Transfer Torque Magnetic Tunneling Junctions

  • You, Chun-Yeol;Kim, Hyungsuk
    • Journal of Magnetics
    • /
    • v.18 no.4
    • /
    • pp.380-385
    • /
    • 2013
  • We find a metastable vortex state of the perpendicular magnetic anisotropy free layer in spin transfer torque magnetic tunneling junctions by using micromagnetic simulations. The metastable vortex state does not exist in a single layer, and it is only found in the trilayer structure with the perpendicular magnetic anisotropy polarizer layer. It is revealed that the physical origin is the non-uniform stray field from the polarizer layer.

Memory Affinity based Load Balancing Model for NUMA System (NUMA 환경에서 메모리 친화력을 고려한 부하 균등 모델)

  • Youn, Dae-Seok;Park, Hee-Kwon;Choi, Jong-Moo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06b
    • /
    • pp.346-350
    • /
    • 2008
  • AMD에서 사용한 HyperTransport 기술 기반 다중 처리기가 좋은 성능을 보이면서 최근 NUMA(Non Uniform Memory Access) 환경에 대한 관심이 증가하고 있다. 본 논문에서는 NUMA 시스템을 위한 부하균등 모델을 제안한다. 다중 처리기 시스템에서 운영체제는 특정 처리기에 부하가 많아지는 것을 부하가적은 처리기로 나누어 주기 위해 부하 균등 기법들을 가지고 있다. 이런 부하 균등 기법은 처리기가 가지고 있는 태스크 개수에 의존적인 연구가 많다. 본 연구에서는 NUMA 시스템의 메모리 접근 비용이 위치에 따라 다른 것을 반영한 부하 균등 기법의 모델을 제시한다. 이를 위해 모의 실험 환경을 구축하고 특정 상황들에 대한 실험을 통해 증명한다.

  • PDF