Browse > Article
http://dx.doi.org/10.5302/J.ICROS.2007.13.12.1147

Fault-Tolerance Improvement of Real-Time Embedded System using Static Checkpointing  

Ryu, Sang-Moon (군산대학교 전자정보공학부)
Publication Information
Journal of Institute of Control, Robotics and Systems / v.13, no.12, 2007 , pp. 1147-1152 More about this Journal
Abstract
This paper deals with a scheme for fault-tolerance improvement of real-time embedded systems, which engages an equidistant checkpointing technique to tolerate transient errors. Transient errors are caused by transient faults which are the most significant type of fault in reliable computer systems. Transient faults are assumed to occur according to a Poisson process and to be detected in a non-concurrent manner (e.g., checked periodically). The probability of the successful real-time task completion in the presence of transient errors is derived with the consideration of the possible effects of the transient errors. Based on this, a condition under which inserting checkpoints improves the fault-tolerance of the system is introduced and an optimal equidistant checkpointing strategy that achieves the highest fault tolerance is presented.
Keywords
real-time embedded system; fault-tolerance; checkpointing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 R. Stroph and T. Clarke, 'Dynamic acceptance tests for complex controllers,' Proc. 24th Euromicro Conference, vol. 1, pp. 411-417, Aug, 1998
2 E. Dupont, M. Nicolaidis, and P. Rohr, 'Embedded robustness IPs for transient-error-free ICs,' IEEE Design & Test of Computers, vol. 19, pp. 56-70, May-Jun., 2002   DOI   ScienceOn
3 R. Melhem, D. Mosse, and E. Elnozahy, 'The interplay of power management and fault recovery in real-time systems,' IEEE Trans. Computers, vol. 53, no. 2, pp.217-231, Feb., 2004   DOI   ScienceOn
4 J. Sosnowski, 'Transient fault tolerance in digital systems,' IEEE Micro, vol. 14, no. 1,pp. 24-35, Feb, 1994
5 C. M. Krishna and K. G. Shin, Real-Time Systems, McGraw-Hill, 1997
6 C. N. Hadjicostis, 'Finite-state machine embeddings for nonconcurrent error detection and identification,' IEEE Trans. Automatic Control, vol. 50, no. 2, pp. 142-153, Feb., 2005   DOI   ScienceOn
7 B. W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems, Addison-Wesley, 1989
8 D. P. Siewiorek, Reliable Computer Systems: Design and Evaluation, A K Peters, 1998
9 E. Normand, 'Signle event upset at ground level,' IEEE Trans. Nuclear Science, vol. 43, no. 6, pp. 2742-2750, Dec., 1996   DOI   ScienceOn
10 A. Taber and E. Normand, 'Single event upset in avionics,' IEEE Trans. Nuclear Science, vol. 40, no. 2, pp. 120-126, Apr., 1993   DOI   ScienceOn
11 B. Randell, 'System structure for software fault tolerance,' IEEE Trans. Software Engineering, vol. 1, no. 2, pp. 220-232, June, 1975
12 K. G. Shin, T.-H. Lin, and Y.-H. Lee, 'Optimal checkpointing of real-time tasks,' IEEE Trans. Computers, vol. C-36, no. 11, pp. 1328-1341, Nov, 1987   DOI   ScienceOn
13 Z. Li, H. Chen and S. Yu, 'Performance optimization for energy-aware adaptive checkpointing in embedded real-time systems,' Proc. Design, Automation and Test in Europe 2006, vol. 1, pp.611, Mar, 2006
14 R. Geist, R. Reynolds, and J. Westall, 'Selection of a checkpoint interval in a critical-task environment,' IEEE Trans. Reliability, vol. 37, no. 4, pp. 395-400, Nov., 1988   DOI   ScienceOn
15 S. Punnekkat, A. Bums, and R. Davis, 'Analysis of checkpointing for real-time systems,' The Int'l Journal of Time Critical Computing Systems (Real-Time Systems), vol. 20, no. 1, pp. 83-102, Jan, 2001
16 V. K. Stefanidis and K. G. Margaritis, 'Algorithm based fault tolerance: Review and experimental study,' Int'l Conference of Numerical Analysis and Applied Mathematics 2004 (ICNAAM 2004), 2004
17 R. Harboe-Sorensen, E. Daly, F. Teston, H. Schweitzer, R. Nartallo, P. Perol, F. Vandenbussche, H. Dzitko, and J. Cretolle, 'Observation and analysis of single event effects on-board the SOHO satellite,' IEEE Trans. Nuclear Science, vol. 49, no. 3, pp. 1345-1350, Jun., 2002   DOI   ScienceOn
18 Y. Zhang and K. Chakrabarty, 'Dynamic adaptation for fault tolerance and power management in embedded real-time systems,' ACM Trans. Embedded Computing Systems, vol. 3, no. 2, pp. 336-360, May 2004   DOI
19 P. P. Shirvani, N. R. Saxena, and E. J. McCluskey, 'Software-implemented EDAC protection against SEUs,' IEEE Trans. Reliability, vol. 49, no. 3, pp. 273-284, Sep, 2000   DOI   ScienceOn