Search | Korea Science

Size Reduction and Performance Analysis of the Bit-map Table Used in the Bus-based Shared Memory System (버스기반의 공유메모리 시스템에서 사용된 비트맵 테이블의 크기 축소와 성능 분석)

Woo, Jong-Jung;Lee, Ka-Young
- The Transactions of the Korea Information Processing Society
- /
- v.5 no.1
- /
- pp.24-32
- /
- 1998
The bus contention among bus-based shared-memory multiprocessors limits their performance. In addition, under split bus transaction environment, multiprocessors may make some memory requests unnecessary stand by in the memory access buffer, which makes system performance worse. This unnecessary stand-by can be eliminated by maintaining the bitmap table which contains the status bit for each memory block. However, this mechanism requires a great size of SRAM for the status information, which is fully mapped from the whole memory blocks. To solve this problem, we propose a bitmap cache which exploits partial mapping and locality of references. The simulation results show that the proposed system can greatly reduce the capacity of SRAM for the status information with little deteriorating its performance.
PDF

An Extended Real-Time Synchronization Protocols for Shared Memory Multiprocessors (공유메모리 다중 프로세서 실시간 시스템에서의 동기화 프로토콜)

Kang, Seung-Yup;Ha, Rhan
- Proceedings of the Korean Information Science Society Conference
- /
- 1998.10a
- /
- pp.136-138
- /
- 1998
작업들이 자원을 공유하는 경우 예측하기 어려운 지연시간이 발생한다. 다중 프로세서 시스템에서의 자원공유로 인한 지연시간은 더욱 예측하기 어렵다. 실기간 시스템의 스케줄 가능성 검사를 위해서는 이러한 지연시간을 정확히 예측해야한다. 선점가능한 우선순위 구동 CPU 스케줄링 알고리즘에 의해서 다른 우선순위의 작업과의 동기화는 우선순위 역전 문제를 야기한다. 본 논문에서는 다중 프로세서에서의 동기화 프로토콜을 제안하고 작업의 지연시간을 분석한다. 다른 프로세서에 할당된 작업들이 수행중인 자원을 요구할 때, 자원을 수행하는 작업의 우선순위를 높여줌으로써 자원수행을 빠르게 종료하게 한다. 이로 인해 자원에 의한 지연을 최소화한다. 특히, 높은 우선순위 작업의 경우 더욱 작은 지연시간을 갖게한다. 시뮬레이션을 통한 Shared Memory Protocol [5]과의 비교, 분석 결과 성능의 향상을 보임을 알 수 있다. 다양한 작업집합에 대한 지연시간을 분석하였다.
PDF

Improved Parallel Loop Scheduling Algorithm on Shared Memory Systems (공유메모리 시스템에서 개선된 병렬 루프 스케쥴링 알고리즘)

이영규;박두순
- Proceedings of the Korea Multimedia Society Conference
- /
- 2000.04a
- /
- pp.453-457
- /
- 2000
병렬 시스템 환경에서 최적의 스케쥴링을 수행하기 위해서는 병렬성을 가진 iteration 들에 대해 최소의 동기화 오버헤드와 load balance 가 달성하도록 스케쥴링을 수행해야한다. 다중 프로세서들은 실행을 위하여 메모리로부터 iteration 들에 대한 chunk를 계산한 후 할당받게 된다. 이때, 각 프로세서들의 상호 배타적인 메모리 접근으로 많은 오버헤드 및 병목현상이 발생된다. 또한, 프로세서에게 할당된 chunk 내 iteration 들의 실행시간 분포가 서로 상이한 경우에는 load imbalance 의 원인이 되어 결과적으로 전체 스케쥴링에 나쁜 영향을 준다. 따라서, 최적의 스케쥴링을 수행하기 위해서 본 논문에서는 기존의 스케쥴링 방법들에서 문제점들을 도출하고 자료의 국부성과 프로세서 동족성을 고려한 개선된 병렬 루프 알고리즘을 제안하고, 성능평가를 통해 개선된 알고리즘이라는 것을 보였다.
PDF

Remote Cache Replacement Policy using Processor Locality in Multi-Processor System (다중 프로세서 시스템에서 프로세서 지역성을 이용한 원격 캐쉬 교체 정책)

Han Sang Yoon;Kwak Jong Wook;Jhang Seong Tae;Jhon Chu Shik
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.11_12
- /
- pp.541-556
- /
- 2005
The memory access latency of the system has been a primary factor of performance degradation in single-processor system and multi-processor system. The remote memory access latency takes a lot of overhead over the local memory access latency especially in the distributed shared-memory system. To resolve this problem, the multi-level cache architecture that contains a remote cache in the multi-processor system has been proposed. In this paper, we propose a new cache replacement policy that improves the performance of the multi-processor system with the remote cache. If the multi-level cache keeps the multi-level inclusion(MLI) property and uses the LRU(Least Recently Used) cache replacement policy, the LRU information of the higher-level cache(a processor cache) would be different with that of the lower-level cache(a remote cache). In this situation, the replacement of a remote cache line can induce the exchange of a processor cache line that is used by the processor. It is a main factor of performance degradation in a whole system. To alleviate this disadvantage of the LRU replacement polity, the new policy analyses tht processor's remote memory access pattern of each node and uses this information to reduce the number of invalidations of the useful cache line in the higher-level cache. The new replacement policy of the remote cache can improve the performance by $3.5\%$ in maximum and $2.5\%$ in average on SPLASH-2 benchmarks, compared to the general LRU cache replacement policy.
PDF KSCI

Comparative and Combined Performance Studies of OpenMP and MPI Codes (OpenMP와 MPI 코드의 상대적, 혼합적 성능 고찰)

Lee Myung-Ho
- The KIPS Transactions:PartA
- /
- v.13A no.2 s.99
- /
- pp.157-162
- /
- 2006
Recent High Performance Computing (HPC) platforms can be classified as Shared-Memory Multiprocessors (SMP), Massively Parallel Processors (MPP), and Clusters of computing nodes. These platforms are deployed in many scientific and engineering applications which require very high demand on computing power. In order to realize an optimal performance for these applications, it is crucial to find and use the suitable computing platforms and programming paradigms. In this paper, we use SPEC HPC 2002 benchmark suite developed in various parallel programming models (MPI, OpenMP, and hybrid of MPI/OpenMP) to find an optimal computing environments and programming paradigms for them through their performance analyses.
https://doi.org/10.3745/KIPSTA.2006.13A.2.157 인용 PDF KSCI

Formal Design and Verification of Cache Coherency Protocol by ESTEREL (ESTEREL을 이용한 Cache Coherency Protocol의 정형 설계 및 검증)

김민숙;최진영
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.04a
- /
- pp.40-42
- /
- 2002
캐쉬 일관성 유지 프로토콜은 공유 메모리 다중 프로세서 시스템의 정확하고 효율적인 작동에 중요하다. 시스템이 점점 복잡해짐에 따라 시뮬레이션 방법만으로는 프로토콜의 정확성을 확인하기는 어렵다. 본 논문에서는 CC-NUMA용 디렉토리 기반 캐쉬 일관성 프로토콜인 RACE 프로토콜을 정형기법 도구인 ESTEREL을 이용하여 프로토콜이 안정적으로 동작함을 검증하였다.
PDF

Bus Splitting Techniques for MPSoC to Reduce Bus Energy (MPSoC 플랫폼의 버스 에너지 절감을 위한 버스 분할 기법)

Chung Chun-Mok;Kim Jin-Hyo;Kim Ji-Hong
- Journal of KIISE:Computer Systems and Theory
- /
- v.33 no.9
- /
- pp.699-708
- /
- 2006
Bus splitting technique reduces bus energy by placing modules with frequent communications closely and using necessary bus segments in communications. But, previous bus splitting techniques can not be used in MPSoC platform, because it uses cache coherency protocol and all processors should be able to see the bus transactions. In this paper, we propose a bus splitting technique for MPSoC platform to reduce bus energy. The proposed technique divides a bus into several bus segments, some for private memory and others for shared memory. So, it minimizes the bus energy consumed in private memory accesses without producing cache coherency problem. We also propose a task allocation technique considering cache coherency protocol. It allocates tasks into processors according to the numbers of bus transactions and cache coherence protocol, and reduces the bus energy consumption during shared memory references. The experimental results from simulations say the bus splitting technique reduces maximal 83% of the bus energy consumption by private memory accesses. Also they show the task allocation technique reduces maximal 30% of bus energy consumed in shared memory references. We can expect the bus splitting technique and the task allocation technique can be used in multiprocessor platforms to reduce bus energy without interference with cache coherency protocol.
PDF KSCI

Application Behavior-oriented Adaptive Remote Access Cache in Ring based NUMA System (링 구조 NUMA 시스템에서 적응형 다중 그레인 원격 캐쉬 설계)

곽종욱;장성태;전주식
- Journal of KIISE:Computer Systems and Theory
- /
- v.30 no.9
- /
- pp.461-476
- /
- 2003
Due to the implementation ease and alleviation of memory bottleneck effect, NUMA architecture has dominated in the multiprocessor systems for the past several years. However, because the NUMA system distributes memory in each node, frequent remote memory access is a key factor of performance degradation. Therefore, efficient design of RAC(Remote Access Cache) in NUMA system is critical for performance improvement. In this paper, we suggest Multi-Grain RAC which can adaptively control the RAC line size, with respect to each application behavior Then we simulate NUMA system with multi-grain RAC using MINT, event-driven memory hierarchy simulator. and analyze the performance results. At first, with profile-based determination method, we verify the optimal RAC line size for each application and, then, we compare and analyze the performance differences among NUMA systems with normal RAC, with optimal line size RAC, and with multi-grain RAC. The simulation shows that the worst case can be always avoided and results are very close to optimal case with any combination of application and RAC format.
PDF KSCI

Formal Verification of RACE Protocol Using VIS (VIS를 이용한 RACE 포로토콜의 정형검증)

Um, Hyun-Sun;Choi, JIn-Young;Han, Woo-Jong;Ki, An-Do;Shim, Kyu-Hyun
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.7
- /
- pp.2219-2228
- /
- 2000
Caches in a multiprocessing environment introduce the cache coherence problem. When multiple processors maintain locally cached copies of a unique shared-memory location, any local modification of the location can result in a globally inconsistent view of memory. Cache coherence protocols are important to operate a shared-memory multiprocessor system with efficiency and correctness. Since random testing and simulations are not enough to validate correctness of protocols, it is necessary to develop efficient and reliable verification methods. In this appear we present our experience in using VIS (Verification Interacting with Synthesis), a tool of formal method, to analyze a number of property of a cache coherence protocol, RACE (Remote Access Cache coherent Enforcement).
PDF

Sharing Pattern Analysis of an OLTP Application

Lee, Kangwoo;Kim, Hiecheol
- Journal of Korea Society of Industrial Information Systems
- /
- v.7 no.5
- /
- pp.121-128
- /
- 2002
Although multiprocessor systems are widely used in recent years to run commercial workloads, data sharing patterns are rarely explored due to several difficulties. In this paper, we made in-depth sharing pattern analysis for a representative OLTP application, the TPC-B benchmark, running on a cache-coherent shared-memory multiprocessor system. In addition, to illustrate their effects on the performance, the number of cache misses were measured for various numbers of processors, cache sizes and cache block sizes. From these measurements, we found out the shared data in TPC-B largely bear quite different sharing characteristics from those in scientific applications.
PDF

Search Result 39, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)