Browse > Article
http://dx.doi.org/10.3745/KIPSTA.2006.13A.5.421

A Time-Redundant Recovery Scheme of TMR failures Using Retry and Rollback Techniques  

Kang, Myung-Seok (연세대학교 대학원 전기전자공학과)
Son, Byoung-Hee (연세대학교 대학원 전기전자공학과)
Kim, Hag-Bae (연세대학교 전기전자공학과)
Abstract
This paper proposes an integrated recovery approach applying retry and rollback techniques to recover the TMR failure. Combining the time redundancy techniques with W system is apparently effective to recover the TMR failure(or masked error) primarily caused by transient faults. These policies need fewer reconfigurations at the cost of extra time required for the time redundant schemes. The optimal numbers of retry and rollback to minimize the mean execution time of tasks are derived for the proposed method through computing the likelihoods of all possible states of the failed system. The effectiveness of the proposed method is validated through examining certain numerical examples and simulations conducted with a variety of parameters governing environmental characteristics.
Keywords
TMR System; Time-redundancy; Retry; Rollback; Masked Error;
Citations & Related Records
연도 인용수 순위
  • Reference
1 X. Zhuo and S. Li, 'A new design method of voter in fault tolerant redundancy multiple-module multi-microcomputer system,' Digest of Papers FTCS-3, pp.472-475, June, 1983
2 H. Choo, H. Youn, S. Yoo, 'Two-dimensional TMR with partial majority selection and forwarding,' Proceedings of the IEEE International Symposium on ISIE2001, pp.482-487, June, 200l   DOI
3 Yu, Shu-Yi, E.J McCluskey, 'On-line Testing and Recovery in TMR Systems for Real-Time Applications,' International Test Conference (ITC2001), pp.240-249, Oct., 2001   DOI
4 J. Yoon and H. Kim, 'Time-redundant recovery policy of TMR failures using rollback and roll-forward methods,' IEE Proc.-Comput. Digit. Tech, Vol.147, No.2, pp.124-132, March, 2000   DOI   ScienceOn
5 M. Kameyama and T. Higuchi, 'Design of dependent-failure-tolerant microcomputer system using triple-modular redundancy,' IEEE Trans. on Computers, Vol.C-29, No.2, pp. 202-205, February 1980   DOI   ScienceOn
6 P. Ezhilchelvan, J. Helary, M. Raynal, 'Building responsive TMR-based servers in presence of timing constraints,' Proceedings of the Eighth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC'05), pp.267-274, May, 2005   DOI
7 A. Hopkins Jr., T. Smith III, and J. Lala, 'FTMP-a highly reliable fault-tolerant multi-processor for aircraft,' Proceedings of the IEEE, Vol.66, No.10, pp.1221- 239, October, 1978.'   DOI   ScienceOn
8 D. Pradhan and N. Vaidya, 'Roll-Forward and Rollback Recovery: Performance-Reliability Trade-Off,' IEEE Trans. Computers, Vol.46, No.3, pp.372-378, Mar., 1997   DOI   ScienceOn
9 I. Koren, Z. Koren and S. Y. H. Su, 'Analysis of a Class of Recovery Procedures,' IEEE Transactions on Computers, Vol.C-35, No.8, August, 1986   DOI   ScienceOn
10 H. Kim and K. Shin, 'Evaluation of Fault Tolerance Latency from Real-time Application's Perspectives,' IEEE Transactions on Computers, Vol.49, No.1, January, 2000   DOI   ScienceOn
11 K. Shin and H. Kim, 'A Time Redundancy Approach to TMR Failures Using Fault-State Likelihoods,' IEEE Trans. on Computers, Vol.43, No.10, pp.1151-1162, Oct., 1994.   DOI   ScienceOn
12 C. Ramamoorthy and Y. Han, 'Reliability analysis of systems with concurrent error detection,' IEEE Trans, Computers, Vol.24, No.9, pp.868-878, Sept., 1975   DOI   ScienceOn
13 H. Kim and K. Shin, 'Design and Analysis of an Optimal Instruction Retry Policy for TMR Controller Computers,' IEEE Trans. on Computers, Vo1.45, No.11, pp.1217-1226, Nov., 1996   DOI   ScienceOn
14 Y. Lee and K. Shin, 'Optimal design and use of retry in fault-tolerant computing systems,' Journal of the ACM, Vol. 35, pp.45-69, January, 1988   DOI   ScienceOn
15 P. Chande, A. Ramani, and P. Sharma, 'Modular TMR multiprocessor system,' IEEE Trans. on Industrial Electronics, Vol.36, No.1, pp.34-41, February, 1989   DOI   ScienceOn
16 S. McConnel, D. Siewior다, and M. M. Taso, 'The measurement and analysis of transient errors in digital computer systems,' in Digest of Papers, FTCS-9, pp.67-70, June, 1979
17 N. Gaitanis, 'The design of totally self-checking TMR fault-tolerant systems,' IEEE Trans. Computers, Vol.37, No. 11, pp.450-1454, Nov., 1988   DOI   ScienceOn