A Fault-tolerant Task Scheduling Algorithm Supporting the Minimum Schedule Length

최소의 스케줄 길이를 유지하는 결함 허용 태스크 스케줄링 알고리즘

  • Published : 2000.04.01

Abstract

In order to tolerate faults which may occur during the execution of distributed tasks in high-performance parallel computer systems, tasks are duplicated on different processors. In this paper, by utilizing the task duplication based scheduling algorithm, a new task scheduling algorithm which duplicates each task on more than two different processors with the minimum schedule length is presented, and the number of processors required for the duplication is analyzed with the ratio of communication cost to computation time and the workload of the system. A simulation with various task graphs reveals that the number of processors required for the full-duplex fault-tolerant task scheduling with the obtainable minimum schedule length increases about 30% to 75% when compared with that of the task duplication based scheduling algorithm.

Keywords

References

  1. T. Adam, et. al. 'A Comparison of List Schedules for Parallel Processing Systems,' Comm. ACM. Vol.17, No.l2, pp.685-690, Dec., 1974 https://doi.org/10.1145/361604.361619
  2. I. Ahmad and Y.K. Kwok, 'On Exploiting Task Duplication in Parallel Program Scheduling,' IEEE Trans on Parallel and Distributed Systems, Vol 9, No9. pp 872-891. Sept. 1998 https://doi.org/10.1109/71.722221
  3. I.Ahmad and Y.K. Kwok, 'On Parallelizing the Multiprocessor Scheduling Problem,' IEEE Trans. on Parallel and Distributed Systems, Vol 10, No 4, pp.414-432, Apr. 1999 https://doi.org/10.1109/71.762819
  4. A. Bertossi, et al, 'Fault-tolerant Rate-monotonic First- fit Scheduling in Hard-real-time Systems.' IEEE Trans. on Parallel and Distributed Systems, Vol10, No 9, PP.934-945, Sept. 1999 https://doi.org/10.1109/71.798317
  5. H. Chen. et al., 'Static Scheduling Using Linear Clustering Task Duplication,' Proc Int'l Conf Parallel and Distributed Computing and Systems, pp.285-290, Oct 1993
  6. S Darbha and P. Agrawal, 'A Task Duplication based Scalable Scheduling Algorithm for Distributed Memory Systems,' J. of Parallel and Distributed Computing, Vol.46, pp,15-27, 1997 https://doi.org/10.1006/jpdc.1997.1376
  7. S. Darbha and P. Agrawal. 'Optimal Scheduling ?Algonthm for Distributed-Memory Machines,' IEEE Trans, on Parallel and Distributed Systems, Vol.9, No,1, pp.87-95, Jan 1998 https://doi.org/10.1109/71.655248
  8. R. Graham et, al , 'Optimization and Approximation in Determmstic Sequencing and Scheduling: A Survey.' Annals of Discrete Mathematics, pp,287-326, 1979
  9. C Hou and K,G. Shin,'Module Allocation with Timing and Precedence Constraints in Distributed Real-time Systems,' IEEE Proc. Real Time Systems Symp. pp.146-155, Dec 1994
  10. K.H. Kim, et al., 'Fault-tolorat Execution of Real time Tasks through Duplex Assignment within Parallel Computers.' Proc, Int'l Conf. Parallel and Distributed System. Dec 1992
  11. K.H. Kim and B.J. Min, 'Approaches to Implementation of Multiple DRB Stations in Tightly Coupled Computer Networks,' Proc Int'l Computer Software and Applications Conf , Sept 1991
  12. P. Mahcshwari and H. Shen, 'An Efficient Clustering Algorithm for Partitioning Parallel Programs,' Parallel Computing Vol.24, pp,893-909, 1998 https://doi.org/10.1016/S0167-8191(98)00004-0
  13. G. Manimaran and C. Murthy, 'A Fault-tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-time Systems and its Analysis,' IEEE Trans on Parallel and Distributed Systems, Vol.9, No 11, pp,1137 -1152, Nov. 1998 https://doi.org/10.1109/71.735960
  14. T. Varvarigou and J. Trotter. 'Module Replication for Fault-tolerant Real-time Drstributed Systems,' IEEE Trans, on Rehability. Vol.47, No.1, pp.8-18. Mar 1998