Browse > Article
http://dx.doi.org/10.5302/J.ICROS.2004.10.8.748

Reliability Analysis and Fault Tolerance Strategy of TMR Real-time Control Systems  

Kwak, Seong-Woo (계명대학교 전자공학과)
You, Kwan-Ho (성균관대학교 정보통신공학부)
Publication Information
Journal of Institute of Control, Robotics and Systems / v.10, no.8, 2004 , pp. 748-754 More about this Journal
Abstract
In this paper, we propose the Triple Modular Redundancy (TMR) control system equipped with a checkpoint strategy. In this system, faults in a single processor are masked and faults in two or more processors are detected at each checkpoint time. When faults are detected, the rollback recovery is activated to recover from faults. The conventional TMR control system cannot overcome faults in two or more processors. The proposed system can effectively cope with correlated and independent faults in two or more processors. We develop a reliability model for this TMR control system under correlated and independent transient faults, and derive the reliability equation. Then we investigate the number of checkpoints that maximizes the reliability.
Keywords
reliability analysis; triple modular redundancy; rollback recovery; fault tolerance; real-time control system;
Citations & Related Records
연도 인용수 순위
  • Reference
1 C. M. Krishna and A. D. Singh, 'Optimal Configuration of Redundant Real-Time Systems in the Face of Correlated Failure', IEEE Tr. Reliability, vol. 44, pp. 587-594, 1995. 12   DOI   ScienceOn
2 M. Kameyama and T. Higuchi, 'Design of Dependent-Failure-Tolerant Microcomputer System Using Treple-Modular Redundancy', IEEE Tr. Computers, vol. C-29, pp. 202-205, 1980. 2   DOI   ScienceOn
3 H. Kim, K. G. Shin, 'Sequencing Tasks to Minimize the Effects of Near-Coincident Faults in TMP Controller Computers', IEEE Tr. Computers, vol. 45, pp. 1331-1337, 1996. 11   DOI   ScienceOn
4 J. W. Young, 'A First Order Approximation to the Optimal Checkpoint Intervals', Comm. of the ACM, vol. 17, pp. 530-531, 1974. 11   DOI
5 E. Gelenbe, D. Derochette, 'Performance of Rollback Recovery Systems under Intermittent Failures', Comm. of the ACM, vol. 21, pp. 493-499, 1978. 6   DOI   ScienceOn
6 H. Kim and K. G. Shin, 'Design and Analysis of an Optimal Instruction Retry Policy for TMR Controller Computers', IEEE Tr. Computers, vol. 45, pp. 1217-1225, 1996. 11   DOI   ScienceOn
7 K. G. Shin, T.-H. Lin, and Y.-H. Lee, 'Optimal Checkpointing of Real-Time Tasks', IEEE Tr. Computers, vol. C-36, pp. 1328-1341, 1987. 11   DOI   ScienceOn
8 Y.-H. Lee and K. G.. Shin, 'Design and Evaluation of a Fault-Tolerant Multiprocessor Using Hardware Recovery Blocks', EEE Tr. Computers, vol. C-33, pp. 113-124, 1984. 2   DOI   ScienceOn
9 Krishna and A. D. Singh, 'Reliability of Checkpointed Real-Time Systems Using Time Redundancy', IEEE Tr. Reliability, Vol. 42, pp. 427-435, 1993. 9   DOI   ScienceOn
10 R. Geist, R. Reynolds, and J. Westall, 'Selection of a Checkpoint Interval in a Critical-Task Environment', IEEE Tr. Reliability, vol. 37, pp. 395-400, 1988. 10   DOI   ScienceOn
11 A. Ziv and J. Bruck, 'An On-Line Algorithm for Checkpoint Placement', IEEE Tr. Computers, vol. 46, pp. 976-984, 1997. 9   DOI   ScienceOn
12 S. W. Kwak, B. J. Choi and B. K. Kim, 'Optimal Checkpointing Strategy for Real-Time Control Systems under Faults with Exponential Duration', IEEE Tr. Reliability, vol. 50, no. 3, pp. 293-301, Sep. 2001   DOI   ScienceOn
13 S. W. Kwak and B. K. Kim, 'Task Scheduling Strategies for Reliable TMR Controllers using Task Grouping and Assignment', IEEE Tr. Reliability, vol. 49, no. 4, pp. 355-362, Dec. 2000   DOI   ScienceOn