A Checkpointing Framework for Dependable Real-Time Systems

고신뢰 실시간 시스템을 위한 체크포인팅 프레임워크

  • 이효순 (서울대학교 컴퓨터공학부) ;
  • 신현식 (서울대학교 컴퓨터공학부)
  • Published : 2002.04.01

Abstract

We provide a checkpointing framework reflecting both the timeliness and the dependability in order to make checkpointing applicable to dependable real-time systems. The predictability of real-time tasks with checkpointing is guaranteed by the worst case execution time (WCET) based on the allocated number of checkpoints and the permissible number of failures. The permissible number of failures is derived from fault tolerance requirements, thus guaranteeing the dependability of tasks. Using the WCET and the permissible number of failures of tasks, we develop an algorithm that determines the minimum number of checkpoints allocated to each task in order to guarantee the schedulability of a task set. Since the framework is based on the amount of time redundancy caused by checkpointing, it can be extended to other time redundancy techniques.

본 논문은 고신뢰 실시간 시스템에 체크포인팅을 적용할 수 있도록 실시간성과 신뢰성을 모두 고려하는 체크포인팅 프레임워크를 제공한다. 실시간 태스크의 시간 예측성은 할당된 체크포인트의 수와 태스크가 실행 중에 감내 해야하는 고장의 수를 기반으로 태스크의 최악 실행 시간(WCET: Worst Case Execution Time)을 산출함으로써 보장된다. 태스크가 실행 중에 극복해야하는 고장의 수는 태스크의 신뢰성 요구조건을 기반으로 산출됨으로써 태스크의 신뢰성이 보장되도록 한다. 이렇게 얻어진 태스크들의 WCET와 태스크가 극복해야 하는 고장의 수를 이용하여, 각 태스크의 스케줄 가능성을 보장하기 위해 요구되는 최소의 체크포인트 수를 유도하는 알고리즘을 제안한다. 본 논문에서 제안하는 프레임워크는 체크포인팅의 시간 중복량을 기반으로 하므로, 다른 시간 중복 기법에 대해서도 확장이 용이하다.

Keywords

References

  1. A. Bertossi and L. Mancini, 'Scheduling algorithms for fault-tolerance in hard real-time systems,' Journal of Real-Time Systems, vo. 7, pp. 229-245, Nov. 1994 https://doi.org/10.1007/BF01088520
  2. V. Nicola, 'Checkpointing and the modeling of propram execution time,' in Software Fault Tolerance (M. Lyu, ed.), ch. 7, pp. 167-188, Chichester: John Wiley & Sons, 1995
  3. S. Ghosh, Guaranteeing Fault Tolerance Through Scheduling in Real-Time Systems, PhD thesis, University of Pittsburgh, 1996
  4. N. Bowen and D. Pradhan, 'Processor and memory based chekpoint and rollback recorery,' IEEE Computer, vol. 26, pp. 22-29, Feb. 1993 https://doi.org/10.1109/2.191981
  5. M. Pandya and M. Malek, 'Minimum achievable utilization for fault-tolerant processing of periodic tasks,' IEEE Trans. Computers, vol. 47, pp. 1102-1112, Oct. 1998 https://doi.org/10.1109/12.729793
  6. S. Ghosh, R. Melhem, D. Mosse, and J. Sarma, 'Enhancing real-time schedules to tolerate transient faults,' in Proceedings of Real-Time Systems Symposium, 1995 https://doi.org/10.1109/REAL.1995.495202
  7. K. Shin, T. Lin, and Y. Lee, 'Optimal checkpointing of real-time tasks,' IEEE Trans Computers, vol. 36, pp. 1328-1341, Nov. 1987 https://doi.org/10.1109/TC.1987.5009472
  8. A. Tantawi and M. Ruschitzka, 'Performance analysis of checkpointing strategies,' ACM Trans. Computer Systems, vol. 2, pp. 123-144, May 1984 https://doi.org/10.1145/190.357398
  9. J. Plank, M. Beck, G. Kingsley, and K. Li, 'Libckpt:Transparent checkpointing under unix,' Proceedings of USENIX Winter 1995 Technical Conference, pp. 213-223, 1995
  10. D. Pradhan and N. H. Vaidya, 'Roll-forward checkpointing scheme: A novel fault-tolerant architecture,' IEEE Trans. Computers, vol. 43, pp. 1163-1174, Oct. 1994 https://doi.org/10.1109/12.324542
  11. J. Long, W. Fuchs, and J. Abraham, 'A forward recovery strategy using checkpointing in parallel systems,' in Proceedings of International Conference on Parallel Porcessing, pp. 272-275, 1990
  12. A. Burns, R. Davis, and S. Punnekkat, 'Feasibility analysis of fault-tolerant real-time task sets,' Euromicro Workshop on Real-Time Systems, pp. 29-33, 1996
  13. S. Punnekkat, Schedulability Analysis for Fault Tolerant Real-Time Systems, PhD thesis, University of York, 1997
  14. S. Punnekkat, A. Burns and R. Davis, 'Analysis of checkpointing for real-time systems,' Journal of Real-Time Systems, Jan. 2001 https://doi.org/10.1023/A:1026589200419
  15. R. Iyer, D. Rossetti, and M. Hsueh, 'Measurement and modelings of computer reliability as affected by system activity,' ACM Trans. Computer Systems, vol. 4, pp. 214-237, Aug. 1986 https://doi.org/10.1145/6420.6422
  16. X. Castillo, S. McConnel, and D. Siewiorek, 'Derivation and calibration of a transient error reliability model,' IEEE Trans. Computers, vol. 31, pp. 658-671, July 1982 https://doi.org/10.1109/TC.1982.1676063
  17. D. Pradhan, Fault-Tolerant Computer System Design, Prentice-Hall, 1995
  18. H. Lee, H. Shin, N. Chang, 'Checkpoint placement for fault-tolerant real-time systems,' in Preprints of IFAC Workshop on Distributed Computer Control Systems, pp. 61-66, 2000
  19. J. Lehoczky, L. Shar, and Y. Ding, 'The rate monotonic scheduling algorithm:Exact characterization and average case behavior,' in Proceedings of Real-Time Systems Symposium, pp. 166-171, 1989 https://doi.org/10.1109/REAL.1989.63567