Search | Korea Science

Object-Size and Call-Site Tracing based Shared Memory Allocator for False Sharing Reduction in DSM Systems (분산 공유 메모리 시스템에서 거짓 공유를 줄이는 객체-크기 및 호출지-추적 기반 공유 메모리 할당 기법)

Lee, Jong-Woo;Park, Young-Ho;Yoon, Yong-Ik
- Journal of Digital Contents Society
- /
- v.9 no.1
- /
- pp.77-86
- /
- 2008
False sharing is a result of co-location of unrelated data in the same unit of memory coherency, and is one source of unnecessary overhead being of no help to keep the memory coherency in multiprocessor systems. Moreover, the damage caused by false sharing becomes large in proportion to the granularity of memory coherency. To reduce false sharing in page-based DSM systems, it is necessary to allocate unrelated data objects that have different access patterns into the separate shared pages. In this paper we propose sized and call-site tracing-based shared memory allocator, shortly SCSTallocator. SCSTallocator places each data object requested from the different call-sites into the separate shared pages, and at the same time places each data object that has different size into different shared pages. Consequently data objects that have the different call-site and different object size prohibited from being allocated to the same shared page. Our observations show that our SCSTallocator outperforms the existing dynamic shared memory allocators. By combining the two existing allocation technique, we can reduce a considerable amount of false sharing misses.
PDF

A Remote Cache Coherence Protocol for Single Shared Memory in Multiprocessor System (단일 공유 메모리를 가지는 다중 프로세서 시스템의 원격 캐시 일관성 유지 프로토콜)

Kim, Seong-Woon;Kim, Bo-Gwan
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.42 no.6
- /
- pp.19-28
- /
- 2005
The multiprocessor architecture is a good method to improve the computer system performance. The CC-NUMA provides a single shared space with the physically distributed memories is used widely in the multiprocessor computer system. A CC-NUMA has the full-mapped directory for the shared memory md uses a remote cache memory for tile fast memory access. In this paper, we propose a processing node architecture for a CC-NUMA system and a cache coherency protocol on the physically distributed but logically shared system. We show an implementation result of the system which is adopted the cache coherency protocol.
PDF KSCI

Remote Cache Replacement Policy using Processor Locality in Multi-Processor System (다중 프로세서 시스템에서 프로세서 지역성을 이용한 원격 캐쉬 교체 정책)

Han Sang Yoon;Kwak Jong Wook;Jhang Seong Tae;Jhon Chu Shik
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.11_12
- /
- pp.541-556
- /
- 2005
The memory access latency of the system has been a primary factor of performance degradation in single-processor system and multi-processor system. The remote memory access latency takes a lot of overhead over the local memory access latency especially in the distributed shared-memory system. To resolve this problem, the multi-level cache architecture that contains a remote cache in the multi-processor system has been proposed. In this paper, we propose a new cache replacement policy that improves the performance of the multi-processor system with the remote cache. If the multi-level cache keeps the multi-level inclusion(MLI) property and uses the LRU(Least Recently Used) cache replacement policy, the LRU information of the higher-level cache(a processor cache) would be different with that of the lower-level cache(a remote cache). In this situation, the replacement of a remote cache line can induce the exchange of a processor cache line that is used by the processor. It is a main factor of performance degradation in a whole system. To alleviate this disadvantage of the LRU replacement polity, the new policy analyses tht processor's remote memory access pattern of each node and uses this information to reduce the number of invalidations of the useful cache line in the higher-level cache. The new replacement policy of the remote cache can improve the performance by $3.5\%$ in maximum and $2.5\%$ in average on SPLASH-2 benchmarks, compared to the general LRU cache replacement policy.
PDF KSCI

A Dynamic Prefetchiong Scheme for Handling Small Files based on Hadoop Distributed File System (하둡 분산 파일 시스템 기반 소용량 파일 처리를 위한 동적 프리페칭 기법)

Yoo, Sang-Hyun;Youn, Hee-Yong
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2014.07a
- /
- pp.329-332
- /
- 2014
클라우드 컴퓨팅이 활성화 됨에 따라 기존의 파일 시스템과는 다른 대용량 파일 처리에 효율적인 분산파일시스템의 요구가 대두 되었다. 그 중에 하둡 분산 파일 시스템(Hadoop Distribute File System, HDFS)은 기존의 분산파일 시스템과는 달리 가용성과 내고장성을 보장하고, 데이터 접근 패턴을 스트리밍 방식으로 지원하여 대용량 파일을 효율적으로 저장할 수 있다. 이러한 장점 때문에, 클라우드 컴퓨팅의 파일시스템으로 대부분 채택하고 있다. 하지만 실제 HDFS 데이터 집합에서 대용량 파일 보다 소용량 파일이 차지하는 비율이 높으며, 이러한 다수의 소 용량 파일은 데이터 처리에 있어 높은 처리비용을 초래 할 뿐 만 아니라 메모리 성능에 악영향을 끼친다. 하지만 소 용량 파일을 프리패칭 함으로서 이러한 문제점을 해결 할 수 있다. HDFS의 데이터 프리페칭은 기존의 데이터 프리페칭의 기법으로는 적용하기 어려워 HDFS를 위한 데이터 프리패칭 기법을 제안한다.
PDF

Serverless Network Virtual Memory on a Network of Workstations (워크스테이션 네트워크 기반 Serverless 네트워크 가상 메모리)

Kang, Hyun-Soo;Heu, Shin
- Proceedings of the Korean Information Science Society Conference
- /
- 1998.10a
- /
- pp.166-168
- /
- 1998
분산시스템이 고성능의 네트워크로 연결되면서 네트워크 메모리(network memory)라는 새로운 메모리 계층이 등장하였다. 기존 운영체계가 가장 메모리를 위해 로컬 하드디스크를 사용하는 반면, 네트워크 메모리는 네트워크 연결된 각 노드들 중에서 유휴 상태에 있는 노드의 메모리를 가상 메모리로 사용한다. 네트워크 메모리를 활용하는 기존 연구의 대부분은 하나나 그 이상의 관리 서버 노드를 두어 관리 서버가 페이징 디바이스의 역할을 하는 원격 노드들을 관리하에 한다. 관리 서버 노드는 각 노드의 메모리 활용 상태을 점검하여 로컬 노드에게 페이지를 제공할 수 있는 원격 노드와의 중재 역할을 담당한다. 그러나 만약 관리 서버에 문제가 발생할 경우 관리 서버와 연결된 모든 노드들에게도 그 영향이 파급될 수가 있다. 본 논문에서는 serverless 하게 하는 노드들의 관계를 설정함으로 관리서버 노드의 문제가 야기되는 다른 노드들의 다운 현상을 최소화 할 수 있는 serverless 네트워크 가상 메모리를 제시한다.
PDF

The Divisible Electronic Cash System using Secret Sharing Scheme (비밀 분산 기법을 이용한 분할 가능한 전자화폐 시스템)

장석철;이임영
- Proceedings of the Korea Institutes of Information Security and Cryptology Conference
- /
- 2001.11a
- /
- pp.189-192
- /
- 2001
최근 정보통신기술의 발전과 인터넷의 폭발적인 사용자 증가로 인해 전자상거래가 활성화되고 있다. 또한 전자상거래에서 가장 중요한 시스템인 전자화폐 시스템에 대한 연구와 개발이 활발하게 진행되고 있다. 특히, 전자화폐 시스템의 요구사항 중에 분할성 관련된 연구는 대부분이 계층적 구조 테이블을 이용한 방식이었다. 하지만 이 방식은 많은 메모리가 필요하고, 또한 많은 계산량을 필요로 한다는 단점이 있다. 따라서 본 논문에서는 이러한 문제점을 해결하고, 계층적 구조 테이블을 이용하지 않고 분할성을 제공할 수 있는 또 다른 방식인 비밀분산 기법을 이용하여 새로운 전자화폐 시스템을 제안한다.
PDF

Dynamic Load Balancing for Database Sharing Systems (데이타베이스 공유 시스템에서 동적 부하 분산)

Jeong, Chang-Uk;Cho, Haeng-Rae
- Annual Conference of KIPS
- /
- 2002.04a
- /
- pp.75-78
- /
- 2002
데이타베이스 공유 시스템(Database Sharing System: DSS)은 고성능의 온라인 트랜잭션 처리를 위해 다수 개의 컴퓨터를 연동하는 방식으로 각 노드들은 디스크 계층에서 데이타베이스를 공유한다. DSS를 구성하는 각 노드에 트랜잭션을 할당하는 정책이 잘못될 경우 특정 트랜잭션의 폭주로 노드에 과부하가 발생할 수 있으므로. 각 노드의 성능을 최적화하기 위한 부하 분산이 필요하다. 본 논문에서는 트랜잭션 클래스에서 참조하는 데이터베이스의 핫 셋 크기, 각 노드의 메모리 크기와 CPU 성능, 동시에 실행되는 트랜잭션 수의 변화에 따른 처리량 등을 고려한 동적 부하 분산 기법을 제안한다.
PDF

A Scalable OWL Horst Lite Ontology Reasoning Approach based on Distributed Cluster Memories (분산 클러스터 메모리 기반 대용량 OWL Horst Lite 온톨로지 추론 기법)

Kim, Je-Min;Park, Young-Tack
- Journal of KIISE
- /
- v.42 no.3
- /
- pp.307-319
- /
- 2015
Current ontology studies use the Hadoop distributed storage framework to perform map-reduce algorithm-based reasoning for scalable ontologies. In this paper, however, we propose a novel approach for scalable Web Ontology Language (OWL) Horst Lite ontology reasoning, based on distributed cluster memories. Rule-based reasoning, which is frequently used for scalable ontologies, iteratively executes triple-format ontology rules, until the inferred data no longer exists. Therefore, when the scalable ontology reasoning is performed on computer hard drives, the ontology reasoner suffers from performance limitations. In order to overcome this drawback, we propose an approach that loads the ontologies into distributed cluster memories, using Spark (a memory-based distributed computing framework), which executes the ontology reasoning. In order to implement an appropriate OWL Horst Lite ontology reasoning system on Spark, our method divides the scalable ontologies into blocks, loads each block into the cluster nodes, and subsequently handles the data in the distributed memories. We used the Lehigh University Benchmark, which is used to evaluate ontology inference and search speed, to experimentally evaluate the methods suggested in this paper, which we applied to LUBM8000 (1.1 billion triples, 155 gigabytes). When compared with WebPIE, a representative mapreduce algorithm-based scalable ontology reasoner, the proposed approach showed a throughput improvement of 320% (62k/s) over WebPIE (19k/s).
https://doi.org/10.5626/JOK.2015.42.3.307 인용 KSCI

The Design and Implementation of the ParaC Language (ParaC 언어의 설계 및 구현)

Lee, Kyoung-Seok;Woo, Young-Choon;Kim, Jin-Mee;Chi, Dong-Hae
- The Transactions of the Korea Information Processing Society
- /
- v.4 no.11
- /
- pp.2903-2913
- /
- 1997
This paper describes the design and implementation of the ParaC language that supports parallel programming on the shared memory and distributed memory parallel machine. The ParaC language is designed for the effective use of system resources of scalable parallel systems. The goal is achieved by adding parallel and synchronization constructs for shared address spaces, and remote task constructs for distributed address spaces. This paper also shows the translation method, and we implement the translator and the run-time library for parallel execution of extended constructs.
PDF

An Adaptive Prefetching Technique for Software Distributed Shared Memory Systems (소프트웨어 분산공유메모리시스템을 위한 적응적 선인출 기법)

Lee, Sang-Kwon;Yun, Hee-Chul;Lee, Joon-Won;Maeng, Seung-Ryoul
- Journal of KIISE:Computer Systems and Theory
- /
- v.28 no.9
- /
- pp.461-468
- /
- 2001
Though shared virtual memory (SVM) system promise low cost solutions for high performance computing they suffer from long memory latencies. These latencies are usually caused by repetitive invalidations on shared data. Since shared data are accessed through synchronization and the patterns by which threads synchronizes are repetitive, a prefetching scheme bases on such repetitiveness would reduce memory latencies. Based on this observation, we propose a prefetching technique which predicts future access behavior by analyzing access history per synchronization variable. Our technique was evaluated on an 8-node SVM system using the SPLASH-2 benchmark. The results show the our technique could achieve 34%~45% reduction in memory access latencies.
PDF

Search Result 238, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)