• Title/Summary/Keyword: Multiprocessor System

Search Result 200, Processing Time 0.022 seconds

Separated Address/Data Network Design for Bus Protocol compatible Network-on-Chip (버스 프로토콜 호환 가능한 네트워크-온-칩에서의 분리된 주소/데이터 네트워크 설계)

  • Chung, Seungh Ah;Lee, Jae Hoon;Kim, Sang Heon;Lee, Jae Sung;Han, Tae Hee
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.4
    • /
    • pp.68-75
    • /
    • 2016
  • As the number of cores and IPs increase in multiprocessor system-on-chip (MPSoC), network-on-chip (NoC) has emerged as a promising novel interconnection architecture for its parallelism and scalability. However, minimization of the latency in NoC with legacy bus IPs must be addressed. In this paper, we focus on the latency minimization problem in NoC which accommodates legacy bus protocol based IPs considering the trade-offs between hop counts and path collisions. To resolve this problem, we propose separated address/data network for independent address and data phases of bus protocol. Compared to Mesh and irregular topologies generated by TopGen, experimental results show that average latency and execution time are reduced by 19.46% and 10.55%, respectively.

A VLSI Architecture for the Real-Time 2-D Digital Signal Processing (실시간 2차원 디지털 신호처리를 위한 VLSI 구조)

  • 권희훈
    • Information and Communications Magazine
    • /
    • v.9 no.9
    • /
    • pp.72-85
    • /
    • 1992
  • The throughput requirement for many digital signal processing is such that multiple processing units are essential for real-time implementation. Advances in VLSI technology make it feasible to design and implement computer systems consisting of a large number of function units. The research on a very high throughput VLSI architecture for digital signal processing applications requires the development of an algorithm, decomposition scheme which can minimize data communication requirements as well as minimize computational complexity. The objectives of the research are to investigate computationally efficient algorithms for solution of the class of problems which can be modeled as DLSI systems or adaptive system, and develop VLSI architectures and associated multiprocessor systems which can be used to implement these algorithms in real-time. A new VLSI architecture for real-time 2-D digital signal processing applications is proposed in this research. This VLSI architecture extends the concept of having a single processing units in a chip. Because this VLSI architecture has the advantage that the complexity and the number of computations per input does not increase as the size of the input data in increased, it can process very large 2-D date in near real-time.

  • PDF

An Improved Dynamic Quantum-Size Pfair Scheduling for the Mode Change Environments (Mode Change 환경을 위한 개선된 동적 퀀텀 크기 Pfair 스케줄링)

  • Cha, Seong-Duk;Kim, In-Guk
    • Journal of Digital Contents Society
    • /
    • v.8 no.3
    • /
    • pp.279-288
    • /
    • 2007
  • Recently, Baruah et. al. proposed an optimal Pfair scheduling algorithm in the real-time multiprocessor system environments, and several variants of it were presented. All these algorithms assume the fixed unit quantum size. However, under Pfair based scheduling algorithms that are global scheduling technique, quantum size has direct influence on the scheduling overheads such as task switching and cache reload. We proposed a method for deciding the optimal quantum size[2] and an improved method for the task set whose utilization e is less than or equal to $e\;{\leq}\;p/3+1$[3]. However, these methods use repetitive computation of the task's utilization to determine the optimal quantum size. In this paper, we propose a more efficient method that can determine the optimal quantum size in constant time.

  • PDF

Performance Improvement of Base Station Controller using Separation Control Method of Input Messages for Mobile Communication Systems (이동통신 시스템에서 입력 메시지 분리제어 방식을 통한 제어국의 성능 개선)

  • Won, Jong-Gwon;Park, U-Gu;Lee, Sang-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.4
    • /
    • pp.1058-1070
    • /
    • 1999
  • In this paper, we propose a control model which can control the burst input messages of the BSC(Base Station controller) in mobile communication systems more efficiently and reliably, by dividing the input messages characteristically and using multiprocessor system. Using M/M/c/K queueing model, we briefly analyze proposed model to get characteristic parameters which are required to performance improvement. On the base of the results, we compare our proposed model with the conventional one by using SLAM II with regard to the following factors : the call blocking rate of the input message, the distribution of average queue length, the utilization of process controller(server), and the distribution of average waiting time in queue. In addition, we modified our model which has overload control function for burst input messages, and analyzed its performance.

  • PDF

다중프로세서 컴퓨터시스템을 위한 버스중재 프로토콜의 성능 분석 및 비교

  • 김병량
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 1992.10a
    • /
    • pp.2-2
    • /
    • 1992
  • 최근 여러 분야에서 컴퓨터의 용도가 확산되고 더 높은 computing power에 대한 요구가 증가함에 따라, 컴퓨터의 성능을 향상시키기 위하여 프로세서의 고속화와 함께 시스템 구조의 개선을 위한 많은 연구가 진행되고 있다. 한 시스템내에 여러 개의 CPU들이 존재하는 다중프로세서 시스템(multiprocessor system) 구조를 가진 슈퍼미니급 중형 컴퓨터들은 상호연결망으로서 버스(bus) 방식을 많이 채택하고 있다. 버스 구조는 하드웨어가 간단하여 구현이 용이하지만, 여러 개의 시스템 지원들(프로세서들, 기억장치 모듈들 및 입출력 모듈들)이 버스를 공유하기 때문에 경합으로 인한 지연 시간이 발생하게 된다. 이러한 지연 시간으로 인한 성능 저하를 개선하는 방법으로는 버스 수의 증가와 최적 통제 프로토콜의 설계가 있다. 본 연구에서는 여러 개의 버스를 가진 다중프로세서 시스템에서 4가지 대표적인 버스 중재 프로토콜들에 대해 성능을 분석, 비교하여 최적 프로토콜을 제시하고자 한다. 이러한 대규모 하드웨어에 의하여 구현되는 시스템에서 주요 설계 요소들에 따른 시스템 성능 분석과 비교는 설계 단계에서 필수적인 과정이다. 그러나 하드웨어를 만들어서 분석하는 방법은 시간과 비용이 많이 소요되기 때문에 소프트웨어 시뮬레이션 방법이 널리 사용되고 있다. 본 연구팀에서는 시뮬레이션 전용언어인 SLAM II를 이용하여 다중프로세서 시스템의 시뮬레이터를 개발하고, 버스중재 프로토콜(bus arbitration protocol)을 용이하게 변경할 수 있도록 하여 각각의 성능을 비교하였다. 이 연구에서 비교된 프로토콜들은 고정-우선순위 방식(fixed-priority scheme), FIFO(first-in first-out) 방식, 라운드-로빈 방식(round-robin scheme), 및 회전-우선순위 방식(rotating-priority scheme) 등이다. 실험은 시스템의 주요 요소들인 프로세서와 기억장치 모듈 및 버스의 수들을 변경시킴으로써 다양한 시스템 환경에 대한 분석을 시도하였다. 작업 부하가 되는 기하장치 액세스 요구간 시간가격(inter-memory access request time interval)은 필요에 따라서 고정값 또는 확률 분포함수를 사용하였다. 특히, 실행될 프로그램의 특성에 따라 각 프로토콜의 성능이 다르게 나타날 수 있음을 검증하였으며, 기억장치의 지역성(memory locality)에 대한 프로토콜들의 성능도 비교하였다.

  • PDF

Load Balancing of Unidirectional Dual-link CC-NUMA System Using Dynamic Routing Method (단방향 이중연결 CC-NUMA 시스템의 동적 부하 대응 경로 설정 기법)

  • Suh Hyo-Joon
    • The KIPS Transactions:PartA
    • /
    • v.12A no.6 s.96
    • /
    • pp.557-562
    • /
    • 2005
  • Throughput and latency of interconnection network are important factors of the performance of multiprocessor systems. The dual-link CC-NUMA architecture using point-to-point unidirectional link is one of the popular structures in high-end commercial systems. In terms of optimal path between nodes, several paths exist with the optimal hop count by its native multi-path structure. Furthermore, transaction latency between nodes is affected by congestion of links on the transaction path. Hence the transaction latency may get worse if the transactions make a hot spot on some links. In this paper, I propose a dynamic transaction routing algorithm that maintains the balanced link utilization with the optimal path length, and I compare the performance with the fixed path method on the dual-link CC-NUMA systems. By the proposed method, the link competition is alleviated by the real-time path selection, and consequently, dynamic transaction algorithm shows a better performance. The program-driven simulation results show $1{\~}10\%$ improved fluctuation of link utilization, $1{\~}3\%$ enhanced acquirement of link, and $1{\~}6\%$ improved system performance.

Empirical Modeling for Cache Miss Rates in Multiprocessors (다중 프로세서에서의 캐시접근 실패율을 위한 경험적 모델링)

  • Lee, Kang-Woo;Yang, Gi-Joo;Park, Choon-Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.1_2
    • /
    • pp.15-34
    • /
    • 2006
  • This paper introduces an empirical modeling technique. This technique uses a set of sample results which are collected from a few small scale simulations. Empirical models are developed by applying a couple of statistical estimation techniques to these samples. We built two types of models for cache miss rates in Symmetric Multiprocessor systems. One is for the changes of input data set size while the specification of target system is fixed. The other is for the changes of the number of processors in target system while the input data set size is fixed. To develop accurate models, we built individual model for every kind of cache misses for each shared data structure in a program. The final model is then obtained by integrating them. Besides, combined use of Least Mean Squares and Robust Estimations enhances the quality of models by minimizing the distortion due to outliers. Empirical modeling technique produces extremely accurate models without analysis on sample data. In addition, since only snail scale simulations are necessary, once a set of samples can be collected, empirical method can be adopted in any research areas. In 17 cases among 24 trials, empirical models present extremely low prediction errors below $1\%$. In the remaining cases, the accuracy is excellent, as well. The models sustain high quality even when the behavioral characteristics of programs are irregular and the number of samples are barely enough.

A Study on the Parallel Routing in Hybrid Optical Networks-on-Chip (하이브리드 광학 네트워크-온-칩에서 병렬 라우팅에 관한 연구)

  • Seo, Jung-Tack;Hwang, Yong-Joong;Han, Tae-Hee
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.8
    • /
    • pp.25-32
    • /
    • 2011
  • Networks-on-chip (NoC) is emerging as a key technology to overcome severe bus traffics in ever-increasing complexity of the Multiprocessor systems-on-chip (MPSoC); however traditional electrical interconnection based NoC architecture would be faced with technical limits of bandwidth and power consumptions in the near future. In order to cope with these problems, a hybrid optical NoC architecture which use both electrical interconnects and optical interconnects together, has been widely investigated. In the hybrid optical NoCs, wormhole switching and simple deterministic X-Y routing are used for the electrical interconnections which is responsible for the setup of routing path and optical router to transmit optical data through optical interconnects. Optical NoC uses circuit switching method to send payload data by preset paths and routers. However, conventional hybrid optical NoC has a drawback that concurrent transmissions are not allowed. Therefore, performance improvement is limited. In this paper, we propose a new routing algorithm that uses circuit switching and adaptive algorithm for the electrical interconnections to transmit data using multiple paths simultaneously. We also propose an efficient method to prevent livelock problems. Experimental results show up to 60% throughput improvement compared to a hybrid optical NoC and 65% power reduction compared to an electrical NoC.

Design and Performance Analysis of a Parallel Optimal Branch-and-Bound Algorithm for MIN-based Multiprocessors (MIN-based 다중 처리 시스템을 위한 효율적인 병렬 Branch-and-Bound 알고리즘 설계 및 성능 분석)

  • Yang, Myung-Kook
    • Journal of IKEEE
    • /
    • v.1 no.1 s.1
    • /
    • pp.31-46
    • /
    • 1997
  • In this paper, a parallel Optimal Best-First search Branch-and-Bound(B&B) algorithm(pobs) is designed and evaluated for MIN-based multiprocessor systems. The proposed algorithm decomposes a problem into G subproblems, where each subproblem is processed on a group of P processors. Each processor group uses tile sub-Global Best-First search technique to find a local solution. The local solutions are broadcasted through the network to compute the global solution. This broadcast provides not only the comparison of G local solutions but also the load balancing among the processor groups. A performance analysis is then conducted to estimate the speed-up of the proposed parallel B&B algorithm. The analytical model is developed based on the probabilistic properties of the B&B algorithm. It considers both the computation time and communication overheads to evaluate the realistic performance of the algorithm under the parallel processing environment. In order to validate the proposed evaluation model, the simulation of the parallel B&B algorithm on a MIN-based system is carried out at the same time. The results from both analysis and simulation match closely. It is also shown that the proposed Optimal Best-First search B&B algorithm performs better than other reported schemes with its various advantageous features such as: less subproblem evaluations, prefer load balancing, and limited scope of remote communication.

  • PDF

Call-Site Tracing-based Shared Memory Allocator for False Sharing Reduction in DSM Systems (분산 공유 메모리 시스템에서 거짓 공유를 줄이는 호출지 추적 기반 공유 메모리 할당 기법)

  • Lee, Jong-Woo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.7
    • /
    • pp.349-358
    • /
    • 2005
  • False sharing is a result of co-location of unrelated data in the same unit of memory coherency, and is one source of unnecessary overhead being of no help to keep the memory coherency in multiprocessor systems. Moreover. the damage caused by false sharing becomes large in proportion to the granularity of memory coherency. To reduce false sharing in a page-based DSM system, it is necessary to allocate unrelated data objects that have different access patterns into the separate shared pages. In this paper we propose call-site tracing-based shared memory allocator. shortly CSTallocator. CSTallocator expects that the data objects requested from the different call-sites may have different access patterns in the future. So CSTailocator places each data object requested from the different call-sites into the separate shared pages, and consequently data objects that have the same call-site are likely to get together into the same shared pages. We use execution-driven simulation of real parallel applications to evaluate the effectiveness of our CSTallocator. Our observations show that by using CSTallocator a considerable amount of false sharing misses can be additionally reduced in comparison with the existing techniques.