Browse > Article
http://dx.doi.org/10.5302/J.ICROS.2011.17.1.72

An Adaptive Checkpointing Scheme for Fault Tolerance of Real-Time Control Systems with Concurrent Fault Detection  

Ryu, Sang-Moon (Kunsan National University)
Publication Information
Journal of Institute of Control, Robotics and Systems / v.17, no.1, 2011 , pp. 72-77 More about this Journal
Abstract
The checkpointing scheme is a well-known technique to cope with transient faults in digital systems. This paper proposes an adaptive checkpointing scheme for the reliability improvement of real-time control systems with concurrent fault detection capability. With concurrent fault detection capability the effect of transient faults are assumed to be detected with no latency. The proposed adaptive checkpointing scheme is based on the reliability analysis of an equidistant checkpointing scheme. Numerical data show the proposed adaptive scheme outperforms the equidistant scheme from a reliability point of view.
Keywords
real-time control system; fault tolerance; concurrent fault detection; adaptive checkpointing;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
Times Cited By SCOPUS : 0
연도 인용수 순위
1 Y. Zhang and K. Chakrabarty, “Dynamic adaptation for fault tolerance and power management in embedded real-time systems,” ACM Trans. on Embedded Computing Systems, vol. 3, no. 2, pp. 336-360, May 2004.   DOI
2 S. M. A. H. Jafri, et al., “Design of a fault-tolerant coarse-grained reconfigurable architecture: a case study,” Proc. of 2010 11th International Symposium on Quality Electronic Design, pp. 845-852, 2010.
3 M. Maniatakos, et al., “Instruction-level impact analysis of low-level faults in a modern microprocessor controller,” IEEE Transactions on Computers, vol. 59, 2010.
4 L. Costas-Perez and J. J. Rodriguez-Andina, “Algorithmic concurrent error detection in complex digital-processing systems,” IEEE Design & Test of Computers, vol. 26 , no. 1, pp. 60-67, Jan.-Feb. 2009.   DOI
5 M. Richter and M. Goessel, “Concurrent checking with splitparity codes,” Proc. of 15th IEEE International On-Line Testing Symposium, pp. 159-163, Jun. 2009.
6 V. S. Veeravalli, “Fault tolerance for arithmetic and logic unit,” Proc. IEEE Southeastcon 2009, pp. 329-334, Mar. 2009.
7 R. Vemu, et al., “A low-cost concurrent error detection technique for processor control logic,” Proc. Design, Automation and Test in Europe 2008, pp. 897-902, 2008.
8 C. Yen and B. Wu, “Simple error detection methods for hardware implementation of advanced encryption standard,” IEEE Trans. on Computers, vol. 55, no. 6, pp. 720-731, Jun. 2006.   DOI
9 J. W. S. Liu, Real-Time Systems, Prentice-Hall, 2000.
10 A. Leon-Garcia, Probability and Random Processes for Electrical Engineering, Addison Wesley, 1994.
11 S.-M. Ryu, “Performance analysis of checkpointing and dual modular redundancy for fault tolerance of real-time control system,” Journal of Institute of Control, Robotics and Systems, vol. 14, no. 4, pp. 376-380, Apr. 2008.   과학기술학회마을   DOI
12 S.-M. Ryu, “An adaptive checkpointing scheme for fault tolerance of real-time control systems,” Journal of Institute of Control, Robotics and Systems, vol. 15, no. 6, pp. 598-602, Jun. 2009.   과학기술학회마을   DOI
13 Y. Gao, C. Deng, and Y. Che, “An adaptive index-based algorithm using time-coordination in mobile computing,” Proc. of 2008 International Symposiums on Information Processing, pp. 578-585, May 2008.
14 S.-M. Ryu and D.-J. Park, “Checkpointing for the reliability of real-time systems with on-line fault detection,” Lecture Notes in Computer Science, no. 3824, pp. 194-202, Aug. 2005.
15 C.-M. Lin and C.-R. Dow, “Efficient techniques for adaptive independent checkpointing in distributed systems,” IEICE Trans. on Information & Systems, vol. E83-D, no. 8, pp. 1642-1653, Aug. 2000.
16 N. Chen and S. Ren, “Architecture support for behavior-based adaptive checkpointing,” Journal of Software, vol. 3, no. 2, pp. 61-68, Feb. 2008.
17 M. Chtepen, F. Claeys, B. Dhoedt, F. Turck, P. Vanrolleghem, and P. Demeester, “Providing fault-tolerance in unreliable grid systems through adaptive checkpointing and replication,” Lecture Notes in Computer Science, vol. 4487, pp. 454-461, 2007.   DOI
18 Z. Li, H. Chen, and S. Yu, “Performance optimization for energy-aware adaptive checkpointing in embedded real-time systems,” Proc. the conference on Design, Automation and Test in Europe, pp. 678-683, Mar. 2006.
19 Y. Xiang, Z. Li, and H. Chen, “Optimizing adaptive checkpointing schemes for grid workflow systems,” Proc. the Fifth International Conference on Grid and Cooperative Computing Workshops, pp. 181-188, Oct. 2006.
20 B. W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems, Addison-Wesley, 1989.
21 D. P. Siewiorek, Reliable Computer Systems: Design and Evaluation, A K Peters, 1998.
22 E. Dupont, M. Nicolaidis, and P. Rohr, “Embedded robustness IPs for transient-error-free ICs,” IEEE Design & Test of Computers, vol. 19, pp. 56-70, May-Jun. 2002.   DOI
23 B. Randell, “System structure for software fault tolerance,” IEEE Trans. Software Engineering, vol. 1, no. 2, pp. 220-232, June 1975.
24 A. Ziv and J. Bruck, “An on-line algorithm for checkpoint placement,” IEEE Trans. Computers, vol. 46, no. 9, pp. 976-985, Sep. 1997.   DOI
25 Y. Ling, J. Mi, and X. Lin, “A variational calculus approach to optimal checkpoint placement,” IEEE Trans. Computers, vol. 50, no. 7, pp. 699-708, Jul. 2001.   DOI
26 K. G. Shin, T.-H. Lin, and Y.-H. Lee, “Optimal checkpointing of real-time tasks,” IEEE Trans. Computers, vol. C-36, no. 11, pp. 1328-1341, Nov. 1987.   DOI
27 S. Punnekkat, A. Burns, and R. Davis, “Analysis of checkpointing for real-time systems,” The Int'l Journal of Time-Critical Computing Systems (Real-Time Systems), vol. 20, no. 1, pp. 83-102, Jan. 2001.