• Title/Summary/Keyword: 다중 프로세서 시스템

Search Result 281, Processing Time 0.03 seconds

Task Duplication Based Clustering and Scheduling on Symmetric Multiprocessor Systems (대칭형 다중프로세서 시스템에서 태스크 중복기반의 클러스터링과 스케줄링)

  • 강오한;조경미;김기남;김시관
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04a
    • /
    • pp.97-99
    • /
    • 2003
  • 대칭형 다중프로세서 (SMP: Symmetric Multiprocessors) 시스템은 고성능의 병렬 연산을 위한 중요하고 효과적인 기반환경을 제공하고 있다. SMP에서 태스크 클러스터링과 스케줄링 기법은 시스템의 성능에 큰 영향을 미친다. 본 논문에서는 버스 기반의 SMP에서 사용할 수 있는 태스크 중복 기반의 클러스터링과 스케줄링 기법을 소개한다. 본 논문에서 제안한 클러스터링 기법에서는 휴리스틱을 사용하여 중복할 태스크를 선택한 후 프로세서에 할당하고, 스케줄링 기법에서는 잠재하는 통신 충돌을 방지하기 위하여 네트워크 통신 자원을 사전에 할당한다. 새로운 클러스터링과 스케줄링 기법의 성능을 확인하기 위하여 시뮬레이션에서는 통신비용의 변화에 대한 병렬연산시간을 비교하였다.

  • PDF

Fuzzy-based Processor Allocation Strategy for Multiprogrammed Shared-Memory Multiprocessors (다중프로그래밍 공유메모리 다중프로세서 시스템을 위한 퍼지 기반 프로세서 할당 기법)

  • 김진일;이상구
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.10 no.5
    • /
    • pp.409-416
    • /
    • 2000
  • In the shared-memory mutiprocessor systems, shared processing techniques such as time-sharing, space¬sharing, and gang-scheduling are used to improve the overall system utilization for the parallel operations. Recently, LLPC(Loop-Level Process Control) allocation technique was proposed. It dynamically adjusts the needed number of processors for the execution of the parallel code portions based on the current system load in the given job. This method allocates as many available processors as possible, and does not save any processors for the parallel sections of other later-arriving applications. To solve this problem, in this paper, we propose a new processor allocation technique called FPA(Fuzzy Processor Allocation) that dynamically adjusts the number of processors by fuzzifYing the amounts ofueeded number of processors, loads, and estimated execution times of job. The proposed method provides the maximum possibility of the parallism of each job without system overload. We compare the performances of our approaches with the conventional results. The experiments show that the proposed method provides a better performance.

  • PDF

An Extended Real-Time Synchronization Protocols for Shared Memory Multiprocessors (공유메모리 다중 프로세서 실시간 시스템에서의 동기화 프로토콜)

  • Kang, Seung-Yup;Ha, Rhan
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10a
    • /
    • pp.136-138
    • /
    • 1998
  • 작업들이 자원을 공유하는 경우 예측하기 어려운 지연시간이 발생한다. 다중 프로세서 시스템에서의 자원공유로 인한 지연시간은 더욱 예측하기 어렵다. 실기간 시스템의 스케줄 가능성 검사를 위해서는 이러한 지연시간을 정확히 예측해야한다. 선점가능한 우선순위 구동 CPU 스케줄링 알고리즘에 의해서 다른 우선순위의 작업과의 동기화는 우선순위 역전 문제를 야기한다. 본 논문에서는 다중 프로세서에서의 동기화 프로토콜을 제안하고 작업의 지연시간을 분석한다. 다른 프로세서에 할당된 작업들이 수행중인 자원을 요구할 때, 자원을 수행하는 작업의 우선순위를 높여줌으로써 자원수행을 빠르게 종료하게 한다. 이로 인해 자원에 의한 지연을 최소화한다. 특히, 높은 우선순위 작업의 경우 더욱 작은 지연시간을 갖게한다. 시뮬레이션을 통한 Shared Memory Protocol [5]과의 비교, 분석 결과 성능의 향상을 보임을 알 수 있다. 다양한 작업집합에 대한 지연시간을 분석하였다.

  • PDF

Implementation and Performance Analysis of Efficient Packet Processing Method For DPI (Deep Packet Inspection) System using Dual-Processors (듀얼 프로세서 기반 DPI (Deep Packet Inspection) 엔진을 위한 효율적 패킷 프로세싱 방안 구현 및 성능 분석)

  • Yang, Joon-Ho;Han, Seung-Jae
    • The KIPS Transactions:PartC
    • /
    • v.16C no.4
    • /
    • pp.417-422
    • /
    • 2009
  • Implementation of DPI(Deep Packet Inspection) system on a general purpose multiprocessor platform is an attractive option from the implementation cost point of view, since it does not require high-cost customized hardware. Load balancing has been considered as a primary means to achieve high performance in multi processor systems. We claim, however, that in case of DPI system design simply balancing the load of each processor does not necessarily yield the highest system performance. Instead, we propose a method in which tasks are allocated to processors based on their functions. We implemented the proposed method in dual processor Linux system and compare its performance with the existing load balancing methods. Under the proposed method, one processor is dedicated to deal with interrupt handling and generic packet processing, while another processor is dedicated to DPI processing. According to experimental results, the proposed scheme outperforms the existing schemes by 60%, mainly because of the reduction of cache miss and spin lock occurrences.

Mixed Tasks Scheduling Using Improved Synthetic Utilization on Multiprocessor Systems (다중프로세서 시스템에서 개선된 합성 이용율을 이용한 혼합 태스크 스케줄링)

  • Moon, Seok-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.2
    • /
    • pp.351-356
    • /
    • 2015
  • Synthetic utilization on multiprocessor system is not considered periodic tasks, except scheduling methods for aperiodic tasks where one of the real-time aperiodic tasks is a scheduling method. But really aperiodic tasks scheduling method is composed of mixed task types. Aperiodic task scheduling method guarantee an analysis of the schedualibility of aperiodic task. The set of mixed tasks periodic and aperiodic tasks scheduling method uses improved synthetic utilization that is presented in this paper. The new method shows that schedulability increases aperiodic server method.

Design of Pipeline Bus and the Performance Evaluation in Multiprocessor System (다중프로세서 시스템에서 파이프라인 전송 버스의 설계 및 성능 평가)

  • 윤용호;임인칠
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.2
    • /
    • pp.288-299
    • /
    • 1993
  • This paper proposes the new bus protocol in the tightly coupled multiprocessor system. The bus protocol uses the pipelined data transfer and block transfer scheme to increase the bus bandwidth, The bus also has the independent transfer lines for the address and data respectively, and it can transfer the data up to maximum 264 Mbytes /sec. This paper also models the multiprocessor system where each processor boards have the private cache. Simulation evaluates the bus and system performance according to hit ratio of the reference data in cache memory, In the case of using this bus, the bus is evaluated not to be saturated when up to 10 processor boards are connected to the bus. As for up to 4 memory interleavng, the performance increases linearly.

  • PDF

Remote Cache Replacement Policy using Processor Locality in Multi-Processor System (다중 프로세서 시스템에서 프로세서 지역성을 이용한 원격 캐쉬 교체 정책)

  • Han Sang Yoon;Kwak Jong Wook;Jhang Seong Tae;Jhon Chu Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.541-556
    • /
    • 2005
  • The memory access latency of the system has been a primary factor of performance degradation in single-processor system and multi-processor system. The remote memory access latency takes a lot of overhead over the local memory access latency especially in the distributed shared-memory system. To resolve this problem, the multi-level cache architecture that contains a remote cache in the multi-processor system has been proposed. In this paper, we propose a new cache replacement policy that improves the performance of the multi-processor system with the remote cache. If the multi-level cache keeps the multi-level inclusion(MLI) property and uses the LRU(Least Recently Used) cache replacement policy, the LRU information of the higher-level cache(a processor cache) would be different with that of the lower-level cache(a remote cache). In this situation, the replacement of a remote cache line can induce the exchange of a processor cache line that is used by the processor. It is a main factor of performance degradation in a whole system. To alleviate this disadvantage of the LRU replacement polity, the new policy analyses tht processor's remote memory access pattern of each node and uses this information to reduce the number of invalidations of the useful cache line in the higher-level cache. The new replacement policy of the remote cache can improve the performance by $3.5\%$ in maximum and $2.5\%$ in average on SPLASH-2 benchmarks, compared to the general LRU cache replacement policy.

Performance Analysis of A Distributed Shared Memory System Including Minor Performance Factors (군소 성능요인을 고려한 분산공유메모리 시스템 성능의 정밀분석)

  • 박준석;전창호
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10c
    • /
    • pp.671-673
    • /
    • 2000
  • 본 논문에서는 분산공유메모리 다중프로세서 시스템에서 하드웨어 구성요소와 실행환경이 시스템의 전체 성능에 미치는 영향을 시뮬레이션을 통하여 분석한다. PARSEC[1,2]을 이용하여 분산공유메모리 다중프로세서 시스템을 실제 실행환경에 근접하게 모델링하고 그 모델링된 시스템상에 2D FFT를 가상 실행하는 방식의 시뮬레이션 결과, 일반적으로 성능분석을 할 때 성능요소로 고려하지 않는 군소 하드웨어 요소들이 시스템 구성에 따라 시스템의 전체 성능에 상당한 영향을 미침을 밝힌다. 또한 반복순환 구문의 오버헤드, 코드최적화 등 실행조건에 따른 성능의 변화도 정량적으로 분석한다.

  • PDF

A Modified Least-Laxity First Scheduling Algorithm for Reducing Context Switches on Multiprocessor Systems (다중 프로세서 시스템에서 문맥교환을 줄이기 위한 변형된 LLF 스케줄링 알고리즘)

  • 오성흔;길아라;양승민
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.2
    • /
    • pp.68-77
    • /
    • 2003
  • The Least-Laxity First(or LLF) scheduling algorithm assigns the highest priority to a task with the least laxity, and has been proved to be optimal for a uni-processor and sub-optimal for a multi-processor. However, this algorithm Is Impractical to implement because laxity tie results in the frequent context switches among tasks. In this paper, a Modified Least-Laxity First on Multiprocessor(or MLLF/MP) scheduling algorithm is proposed to solve this problem, i.e., laxity tie results in the excessive scheduling overheads. The MLLF/MP is based on the LLF, but allows the laxity inversion. MLLF/MP continues executing the current running task as far as other tasks do not miss their deadlines. Consequently, it avoids the frequent context switches. We prove that the MLLF/MP is also sub-optimal in multiprocessor systems. By simulation results, we show that the MLLF/MP has less scheduling overheads than LLF.

Efficient pipelined FFT processor for the MIMO-OFDM systems (MIMO-OFDM 시스템을 위한 효율적인 파이프라인 FFT 프로세서의 설계)

  • Lee, Sang-Min;Jung, Yun-Ho;Kim, Jae-Seok
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.10C
    • /
    • pp.1025-1031
    • /
    • 2007
  • This paper proposes an area-efficient pipeline FFT processor for MIMO-OFDM systems with four transmitting and four receiving antennas. Since the MIMO-OFDM system transmits multiple data streams, the complexity for the MIMO-OFDM system with a single-channel FFT processor increases linearly with the increase of the number of transmit channels. The proposed FFT processor is based on multi-channel structure, and therefore it can efficiently support multiple data streams. With the mixed radix algorithm, the number of non-trivial multiplications of the proposed FFT processor is decreased. The proposed FFT processor is synthesized with CMOS $0.18{\mu}m$ process and reduces the logic gates by 25% over a 4-channel Radix-4 multi-path delay commutator (R4MDC) FFT processor. Since the MIMO-OFDM FFT processor is one of the largest modules in the systems, the proposed FFT processor will be a vast contribution improvement to the low complexity design of MIMO-OFDM systems.