DOI QR코드

DOI QR Code

An Adaptive Checkpointing Scheme for Fault Tolerance of Real-Time Control Systems with Concurrent Fault Detection

동시 결함 검출 기능이 있는 실시간 제어 시스템의 결함 허용성을 위한 적응형 체크포인팅 기법

  • 류상문 (군산대학교 제어로봇공학과)
  • Received : 2010.06.03
  • Accepted : 2010.11.08
  • Published : 2011.01.01

Abstract

The checkpointing scheme is a well-known technique to cope with transient faults in digital systems. This paper proposes an adaptive checkpointing scheme for the reliability improvement of real-time control systems with concurrent fault detection capability. With concurrent fault detection capability the effect of transient faults are assumed to be detected with no latency. The proposed adaptive checkpointing scheme is based on the reliability analysis of an equidistant checkpointing scheme. Numerical data show the proposed adaptive scheme outperforms the equidistant scheme from a reliability point of view.

Keywords

References

  1. B. W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems, Addison-Wesley, 1989.
  2. D. P. Siewiorek, Reliable Computer Systems: Design and Evaluation, A K Peters, 1998.
  3. E. Dupont, M. Nicolaidis, and P. Rohr, “Embedded robustness IPs for transient-error-free ICs,” IEEE Design & Test of Computers, vol. 19, pp. 56-70, May-Jun. 2002. https://doi.org/10.1109/MDT.2002.1033793
  4. B. Randell, “System structure for software fault tolerance,” IEEE Trans. Software Engineering, vol. 1, no. 2, pp. 220-232, June 1975.
  5. A. Ziv and J. Bruck, “An on-line algorithm for checkpoint placement,” IEEE Trans. Computers, vol. 46, no. 9, pp. 976-985, Sep. 1997. https://doi.org/10.1109/12.620479
  6. Y. Ling, J. Mi, and X. Lin, “A variational calculus approach to optimal checkpoint placement,” IEEE Trans. Computers, vol. 50, no. 7, pp. 699-708, Jul. 2001. https://doi.org/10.1109/12.936236
  7. K. G. Shin, T.-H. Lin, and Y.-H. Lee, “Optimal checkpointing of real-time tasks,” IEEE Trans. Computers, vol. C-36, no. 11, pp. 1328-1341, Nov. 1987. https://doi.org/10.1109/TC.1987.5009472
  8. S. Punnekkat, A. Burns, and R. Davis, “Analysis of checkpointing for real-time systems,” The Int'l Journal of Time-Critical Computing Systems (Real-Time Systems), vol. 20, no. 1, pp. 83-102, Jan. 2001.
  9. S.-M. Ryu, “Performance analysis of checkpointing and dual modular redundancy for fault tolerance of real-time control system,” Journal of Institute of Control, Robotics and Systems, vol. 14, no. 4, pp. 376-380, Apr. 2008. https://doi.org/10.5302/J.ICROS.2008.14.4.376
  10. S.-M. Ryu, “An adaptive checkpointing scheme for fault tolerance of real-time control systems,” Journal of Institute of Control, Robotics and Systems, vol. 15, no. 6, pp. 598-602, Jun. 2009. https://doi.org/10.5302/J.ICROS.2009.15.6.598
  11. S.-M. Ryu and D.-J. Park, “Checkpointing for the reliability of real-time systems with on-line fault detection,” Lecture Notes in Computer Science, no. 3824, pp. 194-202, Aug. 2005.
  12. C.-M. Lin and C.-R. Dow, “Efficient techniques for adaptive independent checkpointing in distributed systems,” IEICE Trans. on Information & Systems, vol. E83-D, no. 8, pp. 1642-1653, Aug. 2000.
  13. N. Chen and S. Ren, “Architecture support for behavior-based adaptive checkpointing,” Journal of Software, vol. 3, no. 2, pp. 61-68, Feb. 2008.
  14. Y. Gao, C. Deng, and Y. Che, “An adaptive index-based algorithm using time-coordination in mobile computing,” Proc. of 2008 International Symposiums on Information Processing, pp. 578-585, May 2008.
  15. M. Chtepen, F. Claeys, B. Dhoedt, F. Turck, P. Vanrolleghem, and P. Demeester, “Providing fault-tolerance in unreliable grid systems through adaptive checkpointing and replication,” Lecture Notes in Computer Science, vol. 4487, pp. 454-461, 2007. https://doi.org/10.1007/978-3-540-72584-8_60
  16. Z. Li, H. Chen, and S. Yu, “Performance optimization for energy-aware adaptive checkpointing in embedded real-time systems,” Proc. the conference on Design, Automation and Test in Europe, pp. 678-683, Mar. 2006.
  17. Y. Xiang, Z. Li, and H. Chen, “Optimizing adaptive checkpointing schemes for grid workflow systems,” Proc. the Fifth International Conference on Grid and Cooperative Computing Workshops, pp. 181-188, Oct. 2006.
  18. Y. Zhang and K. Chakrabarty, “Dynamic adaptation for fault tolerance and power management in embedded real-time systems,” ACM Trans. on Embedded Computing Systems, vol. 3, no. 2, pp. 336-360, May 2004. https://doi.org/10.1145/993396.993402
  19. S. M. A. H. Jafri, et al., “Design of a fault-tolerant coarse-grained reconfigurable architecture: a case study,” Proc. of 2010 11th International Symposium on Quality Electronic Design, pp. 845-852, 2010.
  20. M. Maniatakos, et al., “Instruction-level impact analysis of low-level faults in a modern microprocessor controller,” IEEE Transactions on Computers, vol. 59, 2010.
  21. L. Costas-Perez and J. J. Rodriguez-Andina, “Algorithmic concurrent error detection in complex digital-processing systems,” IEEE Design & Test of Computers, vol. 26 , no. 1, pp. 60-67, Jan.-Feb. 2009. https://doi.org/10.1109/MDT.2009.6
  22. M. Richter and M. Goessel, “Concurrent checking with splitparity codes,” Proc. of 15th IEEE International On-Line Testing Symposium, pp. 159-163, Jun. 2009.
  23. V. S. Veeravalli, “Fault tolerance for arithmetic and logic unit,” Proc. IEEE Southeastcon 2009, pp. 329-334, Mar. 2009.
  24. R. Vemu, et al., “A low-cost concurrent error detection technique for processor control logic,” Proc. Design, Automation and Test in Europe 2008, pp. 897-902, 2008.
  25. C. Yen and B. Wu, “Simple error detection methods for hardware implementation of advanced encryption standard,” IEEE Trans. on Computers, vol. 55, no. 6, pp. 720-731, Jun. 2006. https://doi.org/10.1109/TC.2006.90
  26. J. W. S. Liu, Real-Time Systems, Prentice-Hall, 2000.
  27. A. Leon-Garcia, Probability and Random Processes for Electrical Engineering, Addison Wesley, 1994.