• Title/Summary/Keyword: fault tolerance

Search Result 570, Processing Time 0.028 seconds

Fault-Tolerance Improvement of Real-Time Embedded System using Static Checkpointing (실시간 임베디드 시스템의 결함 허용성 개선을 위한 정적 체크포인팅 방안)

  • Ryu, Sang-Moon
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.13 no.12
    • /
    • pp.1147-1152
    • /
    • 2007
  • This paper deals with a scheme for fault-tolerance improvement of real-time embedded systems, which engages an equidistant checkpointing technique to tolerate transient errors. Transient errors are caused by transient faults which are the most significant type of fault in reliable computer systems. Transient faults are assumed to occur according to a Poisson process and to be detected in a non-concurrent manner (e.g., checked periodically). The probability of the successful real-time task completion in the presence of transient errors is derived with the consideration of the possible effects of the transient errors. Based on this, a condition under which inserting checkpoints improves the fault-tolerance of the system is introduced and an optimal equidistant checkpointing strategy that achieves the highest fault tolerance is presented.

Fault Tolerance Design of Uplink Command Processor (상향링크 명령 처리기의 결함 허용 설계)

  • Gu, Cheol Hoe
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.31 no.3
    • /
    • pp.95-100
    • /
    • 2003
  • Electronic equipment used in satellites are demanding extremely high reliability, so they should be designed to have immunity for some critical faults by using redundancy component. Generally, Communication satellites are assigned to meet the 15 years mission lifetime, of the analysis about faults must be performed to electronic equipments of satellite. This paper is a summary of the fault tolerance design research of command processor, the improvement of reliability and trade-off study of fault tolerance design result. The reliability prediction value of the satellite component used in this research was taken from Koreasat 3 and Kompsat 1. It is important to perform many trade-off studies for fault tolerance design, especially to choose the most proper fault tolerance method for the specified fault scenario.

An Efficient Fault Tolerance Protocol with Backup Foreign Agents in a Hierarchical Local Registration Mobile IP

  • Hong, Choong-Seon;Yim, Ki-Woon;Lee, Dae-Young;Yun, Dong-Sik
    • ETRI Journal
    • /
    • v.24 no.1
    • /
    • pp.12-22
    • /
    • 2002
  • A Mobile IP allows IP hosts to move between different networks without changing their IP addresses. Mobile IP systems supporting local registration were introduced to reduce the number of times a home registration with the remotely located home agent was needed. The local registration Mobile IP scheme enhanced performance by processing registration requests of mobile nodes at a local agent. The local registration approach may affect other aspects of the Mobile IP systems such as fault tolerance. In this paper, we briefly review previous solutions for supporting fault tolerance in local registration Mobile IP systems and propose a fault tolerance protocol with a backup foreign agent in a hierarchical local registration mobile IP to enhance the efficiency of such systems against foreign agent failures. We also describe the specification of the proposed protocol using LOTOS and perform its validation using MiniLite. Finally, we analyze the performance of our proposed fault tolerance protocol through simulation.

  • PDF

An adaptive fault tolerance strategy for cloud storage

  • Xiai, Yan;Dafang, Zhang;Jinmin, Yang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.11
    • /
    • pp.5290-5304
    • /
    • 2016
  • With the growth of the massive amount of data, the failure probability of the cloud storage node is becoming more and more big. A single fault tolerance strategy, such as replication and erasure codes, has some unavoidable disadvantages, which can not meet the needs of the today's fault tolerance. Therefore, according to the file access frequency and size, an adaptive hybrid redundant fault tolerance strategy is proposed, which can dynamically change between the replication scheme and erasure codes scheme throughout the lifecycle. The experimental results show that the proposed scheme can not only save the storage space(reduced by 32% compared with replication), but also ensure the fast recovery of the node failures(increased by 42% compared with erasure codes).

A Study on Fault-Tolerance Design Methods for Nuclear Digital Control Systems (원전 디지털 제어계통을 위한 고장허용설계방법론에 관한 연구)

  • Go, Won-Seok;Choe, Jung-In
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.49 no.1
    • /
    • pp.1-9
    • /
    • 2000
  • In this paper, a design method of fault-tolerance is presented for the nuclear digital control systems composed of software and hardware. As a quantitative design method measure of fault-tolerance, we used Reliability, Availability and Safety. To implement the proposed fault-tolerance, a prototype system has been devised for the digital control systems and a quantitative method of 'Markovian Model' is applied. The results provide the appropriate degree of redundancy and diversity, and fail-safe.

  • PDF

Fault Diameter and Fault Tolerance of Gray Cube (그레이 큐브의 고장 지름(Fault Diameter)과 고장 허용도(Fault Tolerance))

  • Lee, Hyeong-Ok;Joo, Nak-Keun;Lim, Hyeong-Seok
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.8
    • /
    • pp.1930-1939
    • /
    • 1997
  • In this paper, we analyze the fault diameter and fault tolerance of Gray cube proposed recently in [12]. fault diameter of an interconnection network is one of the important network measures concerning the distance between nodes when some nodes fail. It is showed that fault diameter of n-dimensional Gray cube having $2^n$ nodes is [(n+1)/2]+2, ($n{\ge}3$). It means the increment of the longest distance between nodes under node-failure is only constant factor. Comparing the result with the fault diameter of well-known hypercube, the longest routing distance of a message in a Gray cube under node-failure is about the half of that hypercube.

  • PDF

A Biologically Inspired New Hardware Fault Detection: immunotronic and Genetic Algorithm-Based Approach

  • Lee, Sanghyung;Kim, Euntai;Park, Mignon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.1
    • /
    • pp.7-11
    • /
    • 2004
  • This paper proposes a new immunotronic approach for the fault detection in hardware. The suggested method is, inspired by biology and its implementation is based on genetic algorithm. Tolerance conditions in the immunotronic system for fault detection correspond to the antibodies in the biological immune system. A novel algorithm of generating tolerance conditions is suggested based on the principle of the antibody diversity and GA optimization is employed to select mature tolerance conditions in immunotronic fault detection system. The suggested method is applied to the fault detection for MCNC benchmark FSMs (finite state machines) and its effectiveness is demonstrated by the computer simulation.

Analysis of the redundant architecture for the fault-tolerance of a distributed control system

  • Moon, Hong-ju
    • Proceedings of the Korean Reliability Society Conference
    • /
    • 2000.04a
    • /
    • pp.231-238
    • /
    • 2000
  • The distributed digital control system has many shared common components, and a single fault in the system may have effects on not a single function. Not as in an analog system, the faults in a digital system usually make discrete and abrupt changes in its output, which are hard to be expected. To cope with these situations, the fault-tolerance is an inevitable property of a distributed control system. A distributed digital control system consists of many equipments, and each equipment can be implemented by many different technologies. The fault-tolerance has to be implemented depend-ing on the overall architecture and how each equipment is implemented. The paper analyzes and compares the strategies and tactics to add the fault-tolerances in a distributed digital control system, and studies how they can be combined appropriately.

  • PDF

A Design of Low Power MAC Operator with Fault Tolerance (에러 내성을 갖는 저전력 MAC 연산기 설계)

  • Jung, Han-Sam;Ku, Sung-Kwan;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.11
    • /
    • pp.50-55
    • /
    • 2008
  • As more DSP functionalities are integrated into an embedded mobile device, power consumption and device reliability have emerged as crucial issues. As the complexity of mobile embedded designs increases very rapidly, verifying the functionality of the mobile devices has become extremely difficult. Therefore, designs with error (fault) tolerance are often required since these capabilities will enable the design to operate properly even with some existence of errors. However, designs with fault tolerance may suffer from significant power overhead since fault tolerance is often achieved by resource replication. In this paper, we propose a low power and fault tolerant MAC (multiply-and-accumulate) design. The proposed MAC design is based on multiple barrel shifters since MAC designs with barrel-shifters and adders are known to be excellent in terms of power consumption.

The Implementation of Fault Tolerance Service for QoS in Grid Computing (그리드 컴퓨팅에서 서비스 품질을 위한 결함 포용 서비스의 구현)

  • Lee, Hwa- Min
    • The Journal of Korean Association of Computer Education
    • /
    • v.11 no.3
    • /
    • pp.81-89
    • /
    • 2008
  • The failure occurrence of resources in the grid computing is higher than in a tradition parallel computing. Since the failure of resources affects job execution fatally, fault tolerance service is essential in computational grids. And grid services are often expected to meet some minimum levels of quality of service (QoS) for desirable operation. However Globus toolkit does not provide fault tolerance service that supports fault detection service and management service and satisfies QoS requirement. Thus this paper proposes fault tolerance service to satisfy QoS requirement in computational grids. In order to provide fault tolerance service and satisfy QoS requirements, we expand the definition of failure, such as process failure, processor failure, and network failure. And we propose resource scheduling service, fault detection service and fault management service and show implement and experiment results.

  • PDF