Search | Korea Science

Remote Cache Replacement Policy using Processor Locality in Multi-Processor System (다중 프로세서 시스템에서 프로세서 지역성을 이용한 원격 캐쉬 교체 정책)

Han Sang Yoon;Kwak Jong Wook;Jhang Seong Tae;Jhon Chu Shik
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.11_12
- /
- pp.541-556
- /
- 2005
The memory access latency of the system has been a primary factor of performance degradation in single-processor system and multi-processor system. The remote memory access latency takes a lot of overhead over the local memory access latency especially in the distributed shared-memory system. To resolve this problem, the multi-level cache architecture that contains a remote cache in the multi-processor system has been proposed. In this paper, we propose a new cache replacement policy that improves the performance of the multi-processor system with the remote cache. If the multi-level cache keeps the multi-level inclusion(MLI) property and uses the LRU(Least Recently Used) cache replacement policy, the LRU information of the higher-level cache(a processor cache) would be different with that of the lower-level cache(a remote cache). In this situation, the replacement of a remote cache line can induce the exchange of a processor cache line that is used by the processor. It is a main factor of performance degradation in a whole system. To alleviate this disadvantage of the LRU replacement polity, the new policy analyses tht processor's remote memory access pattern of each node and uses this information to reduce the number of invalidations of the useful cache line in the higher-level cache. The new replacement policy of the remote cache can improve the performance by $3.5\%$ in maximum and $2.5\%$ in average on SPLASH-2 benchmarks, compared to the general LRU cache replacement policy.
PDF KSCI

A Study on Efficient Executions of MPI Parallel Programs in Memory-Centric Computer Architecture

Lee, Je-Man;Lee, Seung-Chul;Shin, Dongha
- Journal of the Korea Society of Computer and Information
- /
- v.25 no.1
- /
- pp.1-11
- /
- 2020
In this paper, we present a technique that executes MPI parallel programs, that are developed on processor-centric computer architecture, more efficiently on memory-centric computer architecture without program modification. The technique we present here improves performance by replacing low-speed data communication over the network of MPI library functions with high-speed data communication using the property called fast large shared memory of memory-centric computer architecture. The technique we present in the paper is implemented in two programs. The first program is a modified MPI library called MC-MPI-LIB that runs MPI parallel programs more efficiently on memory-centric computer architecture preserving the semantics of MPI library functions. The second program is a simulation program called MC-MPI-SIM that simulates the performance of memory-centric computer architecture on processor-centric computer architecture. We developed and tested the programs on distributed systems environment deployed on Docker based virtualization. We analyzed the performance of several MPI parallel programs and showed that we achieved better performance on memory-centric computer architecture. Especially we could see very high performance on the MPI parallel programs with high communication overhead.
https://doi.org/10.9708/jksci.2020.25.01.001 인용 PDF KSCI

Design and Implementation of Global Buffer Manager for SAN Shared File (SAN 환경에서의 공유파일 시스템을 위한 광역 버퍼관리기의 설계 및 구현)

이경록;김은경;정병수
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.04a
- /
- pp.79-81
- /
- 2002
최근에는 ATM, Fast Switched LAN, Fiber Channel과 같은 고속의 네트워크의 발달로 인해 분산 환경의 네트워크 파일 시스템에서 디스크를 접근하는 속도보다 원격지 클라이언트의 메모리를 접근하는 속도가 현저하게 증가되었다. 실제로 이와 같은 고속 네트워크 환경을 기반으로 하여 각 서버와 저장 장치를 분리하여 대용량 데이터를 관리하는 SAN(Storage Area Network)과 같은 새로운 네트워크 저장 시스템이 출연하고 있다. 본 논문에서는 이와 같은 새로운 분산 네트워크 파일 저장 시스템 환경에서 필수적으로 고려되어야 하는 광역 버퍼관리기를 설계 및 구현하였다. 본 논문에서 구현된 광역 버퍼 관리기는 크게 데이터 룩업과 버퍼리스트 관리 부분으로 나누어 구성되어 있으며, 이를 위한 적절한 자죠 구조와 시스템 내에 있는 각 호스트간의 버퍼블록정보 유지를 위한 방안 및 기존 운영체제의 커널내 버퍼 관리기와 통합하는 방안을 제시한다.
PDF

Distributed Workflow Framework based on Peer to Peer (P2P 기반 분산 워크플로우 프레임워크)

이이섭;박수현;백두권
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.04a
- /
- pp.692-694
- /
- 2003
본 논문에서는 기존의 워크플로우 표준 및 구현방법에 대한 장단점을 정리하고 최근에 각광을 받고 있는 P2P기술을 기반으로 하는 워크플로우 시스템에 대하여 제안하고자 한다. P2P기술은 각 피어의 자원을 최대로 활용하여 서버의 부하를 줄여주는 장점을 갖고 있었으나, 파일 공유 정도의 단순한 상호 작용 기능만 제공되고 있다. 복잡한 형태의 상호 작용을 요구하는 워크플로우를 지원할 수 있는 구조를 제시함으로서, C/S, CORBA, HTTP 보다 완벽하게 분산된 구조의 워크를로우 시스템을 구축할 수 있게 되었다. 이러한 워크를로우은 보다 확장성 있고 견고하고 고성능을 제공하게 된다. 또한 본 연구에서는 확장성. 성능, 그리고 메모리 요구량에 대하여 기존 시스템과 비교하였다.
PDF

Two-Phase Protocol : Write Performance Enhancement Scheme of the Cooperative Cache for PVFS (두 단계 프로토콜 : PVFS를 위한 상호 협력 캐쉬에서 쓰기 성능 향상 기법)

황인철;정한조;맹승렬;조정완
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.10a
- /
- pp.409-411
- /
- 2003
요즘 값싼 PC들을 빠른 네트웍으로 묶어 놓은 성능을 얻고자하는 클러스터 컴퓨팅에 대한 연구가 활발히 이루어지면서 CPU나 메모리. 네트웍보다 상대적으로 느린 디스크에서 데이터를 읽어 효율적으로 파일 서비스를 하는 분산 파일 시스템이 개발되었다. 기존 분산 파일 시스템 중 클러스터 컴퓨팅에서 많이 사용하는 Linux 운영 체제에서 병렬 I/O를 사용하여 사용자에게 빠른 파일 서비스를 제공하여 주는 PVFS가 개발되었다. 기존 PVFS에서는 캐쉬 시스템을 제공하고 있지 않기 때문에 읽기 성능을 향상시키기 위하여 PVFS를 위한 상호 협력 캐치를 설계하고 구현하였다. PVFS를 위한 상호 협력 캐쉬는 클라이언트의 파일 캐쉬를 공유하여 파일 요구를 처리하는 기법으로 읽기 성능은 크게 향상되었다. 하지만 쓰기의 경우에는 다른 클라이언트에서 가지고 있던 모든 데이터를 찾아 해제하는 부하가 있기 때문에 성능이 좋지 않다. 따라서 본 논문에서는 PVFS를 위한 상호 협력 캐쉬에서 쓰기 성능 향상 기법인 두 단계 프로토콜을 제시하고 구현한다. 그리고 두 단계 프로토콜을 기존 PVFS와 PVFS를 위한 상호 협력 캐쉬 시스템과 성능을 비교, 분석한다.
PDF

Performance Enhancement of A Massive Scientific Data Visualization System on Virtual Reality Environment by Using Data Locality (Data Locality를 활용한 VR환경에서의 대용량 데이터 가시화 시스템의 성능 개선)

Lee, Se-Hoon;Kim, Min-Ah;Lee, Joong-Yeon;Hur, Young-Ju
- Proceedings of the Korea Information Processing Society Conference
- /
- 2012.11a
- /
- pp.284-287
- /
- 2012
GLOVE(GLObal Virtual reality visualization Environment for scientific simulation)는 컴퓨팅 자원의 성능 향상으로 데이터 양이 급속히 증가한 응용 과학과 전산 시뮬레이션 분야의 대용량 과학 데이터를 효율적으로 가시화하여 분석하기 위한 도구이다. GLOVE의 데이터 관리자인 GDM(GLOVE Data Manager)은 대용량 데이터의 분산 병렬 가시화를 위해 분산 공유 메모리를 제공하는 GA(Global Array)를 이용해 테라 바이트 단위의 데이터를 실시간으로 처리한다. 그러나 대용량 과학 데이터를 가시화 하는 과정에서 기존의 Data Locality를 고려하지 않은 데이터 접근 방식으로 인한 성능 저하를 확인했다. 본 논문은 기존 GLOVE에서 발견한 성능 저하 현상을 밝히고, 이에 대한 해결 방법을 제시한다.
https://doi.org/10.3745/PKIPS.y2012m11a.284 인용 PDF

Performance Improvement for PVM by Zero-copy Mechanism (Zero-copy 기술을 이용한 PVM의 성능 개선)

임성택;심재홍;최경희;정기현;김재훈;문성근
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.25 no.5B
- /
- pp.899-912
- /
- 2000
PVM provides users with a single image of high performance parallel computing machine by collecting machines distributed over a network. Low communication overhead is essential to effectively run applications on PVM based platforms. In the original PVM, three times of memory copies are required for a PVM task to send a message to a remote task, which results in performance degradation. We propose a zero-copy model using global shared memory that can be accessed by PVM tasks, PVM daemon, and network interface card(NIC). In the scheme, a task packs data into global shared memory, and notify daemon that the data is ready to be sent, then daemon routes the data to a remote task to which it is sent with no virtual data copy overhead. Experimental result reveals that the message round trip time between two machines is reduced significantly in the proposed zero-copy scheme.
PDF

Reconfigurable Architecture Design for H.264 Motion Estimation and 3D Graphics Rendering of Mobile Applications (이동통신 단말기를 위한 재구성 가능한 구조의 H.264 인코더의 움직임 추정기와 3차원 그래픽 렌더링 가속기 설계)

Park, Jung-Ae;Yoon, Mi-Sun;Shin, Hyun-Chul
- Journal of KIISE:Computer Systems and Theory
- /
- v.34 no.1
- /
- pp.10-18
- /
- 2007
Mobile communication devices such as PDAs, cellular phones, etc., need to perform several kinds of computation-intensive functions including H.264 encoding/decoding and 3D graphics processing. In this paper, new reconfigurable architecture is described, which can perform either motion estimation for H.264 or rendering for 3D graphics. The proposed motion estimation techniques use new efficient SAD computation ordering, DAU, and FDVS algorithms. The new approach can reduce the computation by 70% on the average than that of JM 8.2, without affecting the quality. In 3D rendering, midline traversal algorithm is used for parallel processing to increase throughput. Memories are partitioned into 8 blocks so that 2.4Mbits (47%) of memory is shared and selective power shutdown is possible during motion estimation and 3D graphics rendering. Processing elements are also shared to further reduce the chip area by 7%.
PDF KSCI

Design and Implementation of KDSM(KAIST Distributed Shared Memory) System (KDSM(KAIST Distributed Shared Memory) 시스템의 설계 및 구현)

Lee, Sang-Kwon;Yun, Hee-Chul;Lee, Joon-Won;Maeng, Seung-Ryoul
- Journal of KIISE:Computer Systems and Theory
- /
- v.29 no.5
- /
- pp.257-264
- /
- 2002
In this paper, we give a detailed description of KDSM(KAIST Distributed Shared Memory) system. KDSM is implemented as a user-level library running on Linux 2.2.13, and TCP/IP is used for communication. KDSM uses page-based invalidation protocol, multiple-writer protocol, and supports HLRC(Home-based Lazy Release Consistency) memory consistency model. To evaluate performance of KDSM, we executed 4 scientific applications and compared the result to JLAJLA. The results showed that performance of KDSM almost equal to JIAJIA for 2 applications and performance of KDSM is better than JIAJIA for 2 applications.
PDF KSCI

Performance Analysis of a Multiprocessor System Using Simulator Based on Parsec (Parsec 기반 시뮬레이터를 이용한 다중처리시스템의 성능 분석)

Lee Won-Joo;Kim Sun-Wook;Kim Hyeong-Rae
- Journal of the Korea Society of Computer and Information
- /
- v.11 no.2 s.40
- /
- pp.35-42
- /
- 2006
In this paper we implement a new simulator for performance analysis of a parallel digital signal processing distributed shared memory multiprocessor systems. using Parsec The key idea of this simulator is suitable in simulation of system that uses DMA function of TMS320C6701 DSP chip and local memory which have fast access time. Also, because correction of performance parameter and reconfiguration for hardware components are easy, we can analyze performance of system in various execution environments. In the simulation, FET, 2D FET, Matrix Multiplication. and Fir Filter, which are widely used DSP algorithms. have been employed. Using our simulator, the result has been recorded according to different the number of processor, data sizes, and a change of hardware element. The performance of our simulator has been verified by comparing those recorded results.
PDF

Search Result 96, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)