• Title/Summary/Keyword: multiprocessor systems

Search Result 162, Processing Time 0.024 seconds

Survey on Cache Coherency Schemes for Large Scale Multiprocessor Systems (대규모 다중프로세서 시스템의 캐시 동일성 유지 기법 조사)

  • Ki, A.D.;Hahn, W.J.;Yoon, S.H.
    • Electronics and Telecommunications Trends
    • /
    • v.9 no.3
    • /
    • pp.69-96
    • /
    • 1994
  • 본고에서는 캐시 동일성 유지 기법들을 분류하여 그 특성들을 개략적으로 살펴본 후 대규모 다중프로세서를 위해 제안된 것 중 몇몇 특색있는 것들을 살펴본다.

Design and Implementation of Asynchronous Memory for Pipelined Bus (파이프라인 방식의 버스를 위한 비 동기식 주 기억장치의 설계 및 구현)

  • Hahn, Woo-Jong;Kim, Soo-Won
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.11
    • /
    • pp.45-52
    • /
    • 1994
  • In recent days low cost, high performance microprocessors have led to construction of medium scale shared memory multiprocessor systems with shared bus. Such multiprocessor systems are heavily influenced by the structures of memory systems and memory systems become more important factor in design space as microprocessors are getting faster. Even though local cache memories are very common for such systems, the latency on access to the shared memory limits throughput and scalability. There have been many researches on the memory structure for multiprocessor systems. In this paper, an asynchronous memory architecture is proposed to utilize the bandwith of system bus effectively as well as to provide flexibility of implementation. The effect of the proposed architecture if shown by simulation. We choose, as our model of the shared bus is HiPi+Bus which is designed by ETRI to meet the requirements of the High-Speed Midrange Computer System. The simulation is done by using Verilog hardware decription language. With this simulation, it is explored that the proposed asynchronous memory architecture keeps the utilization of system bus low enough to provide better throughput and scalibility. The implementation trade-offs are also described in this paper. The asynchronous memory is implemented and tested under the prototype testing environment by using test program. This intensive test has validated the operation of the proposed architecture.

  • PDF

컴퓨터 시스템의 시뮬레이션 모델링에 대한 정보 구조의 구축에 관한 연구

  • 손달호
    • The Journal of Information Systems
    • /
    • v.1
    • /
    • pp.111-122
    • /
    • 1992
  • 본 논문은 IBIS(Information-Based Integrated Simulation)라 불리우는 정보 중심 의 시뮬레이션 방법을 다중 처리(Multiprocessor) 컴퓨터 시스템의 시뮬레이션 모델링에 이 용하였다. IBIS는 지금까지 시뮬레이션의 중요한 접근 방법이었던 언어 정의 (Language-Defined) 혹은 목적 지향적인(Object-oriented)방법의 단점을 보완한 방법으로 각각의 다른 단계에 있는 여러 개의 모델들을 조합하여 정보 시스템을 구축하루 수 있다. 본 연구에서 IBIS적인 접근 방법을 컴퓨터 시스템의 시뮬레이션 모델링에 국한하였으나 이 를 확장하면 서비스 시스템을 포함하여 시뮬레이션이 적용될 수 있는 모든 시스템에 IBIS적 인 접근 방법을 이용할 수 있을 것이다.

  • PDF

Proposition and Evaluation of Parallelism-Independent Scheduling Algorithms for DAGs of Tasks with Non-Uniform Execution Time

  • Kirilka Nikolova;Atusi Maeda;Sowa, Masa-Hiro
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.289-293
    • /
    • 2000
  • We propose two new algorithms for parallelism-independent scheduling. The machine code generated from the compiler using these algorithms in its scheduling phase is parallelism-independent code, executable in minimum time regardless of the number of the processors in the parallel computer. Our new algorithms have the following phases: finding the minimum number of processors on which the program can be executed in minimal time, scheduling by an heuristic algorithm for this predefined number of processors, and serialization of the parallel schedule according to the earliest start time of the tasks. At run time tasks are taken from the serialized schedule and assigned to the processor which allows the earliest start time of the task. The order of the tasks decided at compile time is not changed at run time regardless of the number of the available processors which means there is no out-of-order issue and execution. The scheduling is done predominantly at compile time and dynamic scheduling is minimized and diminished to allocation of the tasks to the processors. We evaluate the proposed algorithms by comparing them in terms of schedule length to the CP/MISF algorithm. For performance evaluation we use both randomly generated DAGs (directed acyclic graphs) and DACs representing real applications. From practical point of view, the algorithms we propose can be successfully used for scheduling programs for in-order superscalar processors and shared memory multiprocessor systems. Superscalar processors with any number of functional units can execute the parallelism-independent code in minimum time without necessity for dynamic scheduling and out-of-order issue hardware. This means that the use of our algorithms will lead to reducing the complexity of the hardware of the processors and the run-time overhead related to the dynamic scheduling.

  • PDF

A Parallel Emulation Scheme for Data-Flow Architecture on Loosely Coupled Multiprocessor Systems (이완 결합형 다중 프로세서 시스템을 사용한 데이터 플로우 컴퓨터 구조의 병렬 에뮬레이션에 관 한 연구)

  • 이용두;채수환
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.12
    • /
    • pp.1902-1918
    • /
    • 1993
  • Parallel architecture based on the von Neumann computation model has a limitation as a massively parallel architecture due to its inherent drawback of architectural features. The data-flow model of computation has a high programmability in software perspective and high scalability in hardware perspective. However, the practical programming and experimentaion of date-flow architectures are hardly available due to the absence of practical data-flow, we present a programming environment for performing the data-flow computation on conventional parallel machines in general, loosely compled multiprocessor system in particular. We build an emulator for tagged token data-flow architecture on the iPSC/2 hypercube, a loosely coupled multiprocessor system. The emulator is a shallow layer of software executing on an iPSC/2 system, and thus makes the iPSC/2 system work as a data-flow architecture from the programmer`s viewpoint. We implement various numerical and non-numerical algorithm in a data-flow assembler language, and then compare the performance of the program with those of the versions of conventional C language, Consequently, We verify the effectiveness of this programming environment based on the emulator in experimenting the data-flow computation on a conventional parallel machine.

  • PDF

An Implementation of the Linear Scheduling Algorithm in Multiprocessor Systems using Genetic Algorithms (유전 알고리즘을 이용한 다중프로세서 시스템에서의 선형 스케쥴링 알고리즘 구현)

  • Bae, Sung-Hwan;Choi, Sang-Bang
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.2
    • /
    • pp.135-148
    • /
    • 2000
  • In this paper, we present a linear scheduling method for homogeneous multiprocessor systems using genetic algorithms. In general, genetic algorithms randomly generate initial strings, which leads to long operation time and slow convergence due to an inappropriate initialization. The proposed algorithm considers communication costs among processors and generates initial strings such that successive nodes are grouped into the same cluster. In the crossover and mutation operations, the algorithm maintains linearity in scheduling by associating a node with its immediate successor or predecessor. Linear scheduling can fully utilize the inherent parallelism of a given program and has been proven to be superior to nonlinear scheduling on a coarse grain DAG (directed acyclic graph). This paper emphasizes the usability of the genetic algorithm for real-time applications. Simulation results show that the proposed algorithm rapidly converges within 50 generations in most DAGs.

  • PDF

Memory Allocation Scheme for Reducing False Sharing on Multiprocessor Systems (다중처리기 시스템에서 거짓 공유 완화를 위한 메모리 할당 기법)

  • Han, Boo-Hyung;Cho, Seong-Je
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.4
    • /
    • pp.383-393
    • /
    • 2000
  • In shared memory multiprocessor systems, false sharing occurs when several independent data objects, not shared but accessed by different processors, are allocated to the same coherency unit of memory. False sharing is one of the major factors that may degrade the performance of memory coherency protocols. This paper presents a new shared memory allocation scheme to reduce false sharing of parallel applications where master processor controls allocation of all the shared objects. Our scheme allocates the objects to temporary address space for the moment, and actually places each object in the address space of processor that first accesses the object later. Its goal is to allocate independent objects that may have different access patterns to different pages. We use execution-driven simulation of real parallel applications to evaluate the effectiveness of our scheme. Experimental results show that by using our scheme a considerable amount of false sharing faults can be reduced with low overhead.

  • PDF

Keeping-ownership Cache Replacement Policies for Remote Access Caches of NUMA System (NUMA 시스템에서 소유권에 근거한 원격 캐시 교체 정책)

  • 신숭현;곽종욱;장성태;전주식
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.8
    • /
    • pp.473-486
    • /
    • 2004
  • NUMA systems have remote access caches(RAC) in each local node to reduce the overhead for repeated remote memory accesses. By this RAC, memory latency and network traffic can be reduced and the performance of the multiprocessor system can be improved. Until now, several cache replacement policies have been proposed in recent years, and there also is cache replacement policy for multiprocessor systems. In this paper, we propose a cache replacement policy which is based on cache line coherence information. In this policy, the cache line that does not have an ownership is replaced first with respect to cache line that has an ownership. Like this way, the overhead to transfer ownership is avoided and the memory latency can be decreased. We also propose “Keeping-Ownership replacement policy with MRU (KOM)” and “Keeping-Ownership replacement policy with Reference Bit(KORB)” to reduce the frequent replacement penalty of the ownership-lacking cache line. We compare and analyze these with LRU and Pseudo LRU(PLRU). The simulation shows that KOM outperforms the PLRU by 25%, and KORB outperforms the PLRU by 13%. Although the hardware cost of KOM is very small, the performance of KOM is nearly equal to that of the LRU.

Performance Analysis of Futurebus+ based Multiprocessor Systems with MESI Cache Coherence Protocol (MESI 캐쉬 코히어런스 프로토콜을 사용하는 Futurebus+ 기반 멀티프로세서 시스템의 성능 평가)

  • 고석범;강인곤;박성우;김영천
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.12
    • /
    • pp.1815-1827
    • /
    • 1993
  • In this paper, we evaluate the performance of a Futurebus based multiprocessor system with MESI cache coherence protocol for four bus transaction types. Graphical symbols and compiler of SLAM II are used in modeling and simulation. A steady-state probability of each state for MESI protocol is computed by a Markov chain. The probability of each state is used as an input value for a correct simulation. Processor utilization, memory utilization, bus utilization, and the waiting time for bus arbitration are measured in terms of the number of processors, the hit ratio of cache memory, the probability of internal operation, and bus bandwidth.

  • PDF