• Title/Summary/Keyword: Fault Tolerance

Search Result 570, Processing Time 0.033 seconds

A Fault-Tolerant Scheme Based on Message Passing for Mission-Critical Computers (임무지향 컴퓨터를 위한 메시지패싱 고장감내 기법)

  • Kim, Taehyon;Bae, Jungil;Shin, Jinbeom;Cho, Kilseok
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.18 no.6
    • /
    • pp.762-770
    • /
    • 2015
  • Fault tolerance is a crucial design for a mission-critical computer such as engagement control computer that has to maintain its operation for long mission time. In recent years, software fault-tolerant design is becoming important in terms of cost-effectiveness and high-efficiency. In this paper, we propose MPCMCC which is a model-based software component to implement fault tolerance in mission-critical computers. MPCMCC is a fault tolerance design that synchronizes shared data between two computers by using the one-way message-passing scheme which is easy to use and more stable than the shared memory scheme. In addition, MPCMCC can be easily reused for future work by employing the model based development methodology. We verified the functions of the software component and analyzed its performance in the simulation environment by using two mission-critical computers. The results show that MPCMCC is a suitable software component for fault tolerance in mission-critical computers.

Implementation of IEEE 1451 based Dual CAN Module for Fault Tolerance of In-Vehicle Networking System (차량 네트워크 시스템의 결함 허용을 위한 IEEE 1451 기반 중복 CAN 모듈의 구현)

  • Lee, Jong-Gap;Kim, Man-Ho;Park, Jee-Hun;Lee, Suk;Lee, Kyung-Chang
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.15 no.7
    • /
    • pp.753-759
    • /
    • 2009
  • As many systems depend on electronics in an intelligent vehicle, concern for fault tolerance is growing rapidly. For example, a car with its braking controlled by electronics and no mechanical linkage from brake pedal to calipers of front tires(brake-by-wire system) should be fault tolerant because a failure can come without any warning and its effect is devastating. In general, fault tolerance is usually designed by placing redundant components that duplicate the functions of the original module. In this way a fault can be isolated, and safe operation is guaranteed by replacing the faulty module with its redundant and normal module within a predefined interval. In order to make in-vehicle network fault tolerant, this paper presents the concept and design methodology of an IEEE 1451 based dual CAN module. In addition, feasibility of the dual CAN network was evaluated by implementing the dual CAN module.

A Research to Enhance the Fault Tolerance of the CORBA Based Traffic Information Systems (CORBA 기반 교통정보시스템의 Fault Tolerance 향상을 위한 연구)

  • Seh, Woon-Suk;Ryu, Kwang-Taek;Lee, Eun-Seok
    • The KIPS Transactions:PartD
    • /
    • v.10D no.6
    • /
    • pp.991-998
    • /
    • 2003
  • There are many methods to enhance the fault tolerance of the CORBA based real time systems by viewpoints. Among them, this paper provides a method to enable seamless services where the systems based on the CORBA have object's faults originated processing real time information. Namely, this paper observes a method to deal efficiently with object's faults happening in 3 tier architecture environments. It is possible to replicate objects as a way to enhance the fault tolerance considering object's faults. Along with it, this paper shows a method to enhance the fault tolerance ultimately and then keep the service continuity by prividing a way to allow to continue to run the systems until the FT-CORBA based one's faults are recovered.

Deterministic Measures of Fault-Tolerance in Recursive Circulants and Hypercubes (재귀원형군과 하이퍼큐브의 고장 감내에 대한 결정적 척도)

  • Park, Jung-Heum;Kim, Hee-Chul
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.9
    • /
    • pp.493-502
    • /
    • 2002
  • The connectivity and edge-connectivity have been the prime deterministic measure of fault tolerance in multicomputer networks. These parameters have a problem that they do not differentiate the different types of disconnected graphs which result from removing the disconnecting vertices or disconnecting edges. To compensate for this shortcoming, one can utilize generalized measures of connectedness such as superconnectivity, toughness, scattering number, vertex-integrity, binding number, and restricted connectivity. In this paper, we analyze such deterministic measures of fault tolerance in recursive circulants and hypercubes, and compare them in terms of fault tolerance.

Implementation of Shadow Server for Fault-tolerance in SAN-based Shared File System (SAN 기반 공유 파일 시스템에서 Fault-tolerance를 위한 Shadow Server 구현)

  • 최영한;김형천;홍순좌
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.661-663
    • /
    • 2004
  • 본 논문에서는 SAN 기반 공유 파일 시스템인 SANfs의 fault-tolerance를 보장받기 위해 fault-tolerant server인 shadow server를 구현하였다 SANfs(1)는 SAN에서 Network-attached storage에 접근하는 여러 클라이언트가 서로의 데이터를 공유할 수 있도록 도와주는 파일시스템이다. SANfs에서 파일 관리를 위해 meta server를 두고 있으며, 이 서버에서 네트워크를 통해 접근하는 털러 클라이언트의 request를 관리한다. SAMfs에서는 meta server를 통해 중앙 집중식으로 파일시스템을 관리하고 있기에 meta server가 fault가 나게 되면 전체 시스템의 동작이 멈추게 되는 single point-of-failure의 문제가 생기게 된다. 본 논문에서는 meta server가 fault가 났을 경우에도 지속적으로 서비스를 할 수 있도록 shadow server를 두었으며. 이 서버가 meta server의 이상 시 그 기능을 대행하도록 하였다. 본 논문의 shadow server는 평상시에 meta server와 파일시스템의 metadata의 동기를 맞추고 있으며, 이 정보를 가지고 meta server로 그 기능을 전환하였을 때 서비스를 해 주도록 하고 있다. 상대 서버의 이상 유무의 판단은 heartbeat를 통해 이루어지고 있으며, meta server로의 failover는 heartbeat의 주기에 영향을 받음을 실험을 통해 알게 되었다.

  • PDF

System Reliability (시스템 信賴性)

  • 김동주
    • The Magazine of the IEIE
    • /
    • v.5 no.1
    • /
    • pp.31-37
    • /
    • 1978
  • The value of a system is highly dependent upon its reliability, Reliability means not merely correctness but means fault tolerance of the system. This paper emphasizes software fault tolerance in design stage especially in case of computer controlled system. The general method of fault tolerance design especially including dual computer system and its advantage and disadvantage was introduced. Finally for example of fault tolerance design we would like to present our GTK-500 EPABX.

  • PDF

A study on Hardware Redundancy Architecture of Fault-Tolerant System (결함허용 시스템의 하드웨어 여분구조에 대한 연구)

  • shin Ducko;Lee Jong-woo;Lee Jae-ho;Lee Key-seo
    • Proceedings of the KSR Conference
    • /
    • 2003.05a
    • /
    • pp.450-455
    • /
    • 2003
  • This paper is to discuss the hardware redundancy architecture of fault-tolerance system with using redundancy. Each architecture will be studied to implement fault-tolerance in classifying hardware redundancy architecture as passive, active and hybrid hardware redundancy. Therefore Fault-Masking and Fault-Detecting Techniques in each redundancy architecture is studied.

  • PDF

A Fault Tolerant Data Management Scheme for Healthcare Internet of Things in Fog Computing

  • Saeed, Waqar;Ahmad, Zulfiqar;Jehangiri, Ali Imran;Mohamed, Nader;Umar, Arif Iqbal;Ahmad, Jamil
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.1
    • /
    • pp.35-57
    • /
    • 2021
  • Fog computing aims to provide the solution of bandwidth, network latency and energy consumption problems of cloud computing. Likewise, management of data generated by healthcare IoT devices is one of the significant applications of fog computing. Huge amount of data is being generated by healthcare IoT devices and such types of data is required to be managed efficiently, with low latency, without failure, and with minimum energy consumption and low cost. Failures of task or node can cause more latency, maximum energy consumption and high cost. Thus, a failure free, cost efficient, and energy aware management and scheduling scheme for data generated by healthcare IoT devices not only improves the performance of the system but also saves the precious lives of patients because of due to minimum latency and provision of fault tolerance. Therefore, to address all such challenges with regard to data management and fault tolerance, we have presented a Fault Tolerant Data management (FTDM) scheme for healthcare IoT in fog computing. In FTDM, the data generated by healthcare IoT devices is efficiently organized and managed through well-defined components and steps. A two way fault-tolerant mechanism i.e., task-based fault-tolerance and node-based fault-tolerance, is provided in FTDM through which failure of tasks and nodes are managed. The paper considers energy consumption, execution cost, network usage, latency, and execution time as performance evaluation parameters. The simulation results show significantly improvements which are performed using iFogSim. Further, the simulation results show that the proposed FTDM strategy reduces energy consumption 3.97%, execution cost 5.09%, network usage 25.88%, latency 44.15% and execution time 48.89% as compared with existing Greedy Knapsack Scheduling (GKS) strategy. Moreover, it is worthwhile to mention that sometimes the patients are required to be treated remotely due to non-availability of facilities or due to some infectious diseases such as COVID-19. Thus, in such circumstances, the proposed strategy is significantly efficient.

A fault detection and recovery mechanism for the fault-tolerance of a Mini-MAP system (Mini-MAP 시스템의 결함 허용성을 위한 결함 감지 및 복구 기법)

  • Mun, Hong-Ju;Kwon, Wook-Hyun
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.4 no.2
    • /
    • pp.264-272
    • /
    • 1998
  • This paper proposes a fault detection and recovery mechanism for a fault-tolerant Mini-MAP system, and provides detailed techniques for its implementation. This paper considers the fault-tolerant Mini-MAP system which has dual layer structure from the LLC sublayer down to the physical layer to cope with the faults of those layers. For a good fault detection, a redundant and hierarchical fault supervision architecture is proposed and its implementation technique for a stable detection operation is provided. Information for the fault location is provided from data reported with a fault detection and obtained by an additional network diagnosis. The faults are recovered by the stand-by sparing method applied for a dual network composed of two equivalent networks. A network switch mechanism is proposed to achieve a reliable and stable network function. A fault-tolerant Mini-MAP system is implemented by applying the proposed fault detection and recovery mechanism.

  • PDF

A Study on the Fault Tolerance and High Efficiency Control of 4 Leg DC/DC Converter for Battery Energy Storage System in Standalone DC Micro-grid (독립형 DC마이크로그리드 내 BESS용 4 LEG DC/DC 컨버터의 고장허용 및 고효율 제어에 관한 연구)

  • Choi, Jung-Sik;Oh, Seung-Yeol;Cha, Dae-Seak;Chung, Dong-Hwa
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.9
    • /
    • pp.1239-1248
    • /
    • 2018
  • This paper proposes a fault tolerant and high efficiency operation algorithm for a 4 LEG DC/DC converter for a battery energy storage system(BESS) forming a main power source in a standalone DC micro grid. The BESS for the main power supply in the stand-alone DC micro-grid is required to operate at high speed according to fault tolerant control and load by operating at all times. Fault-tolerance control changes the short-circuit fault to an open-circuit fault by using a fuse in case of leg fault in 4 legs, and operates stably through phase shift control. In addition, considering the loss of the power semiconductor, the number of LEG operation is adjusted to operate at high efficiency in the full load region. In this paper, fault tolerant control and high efficiency operation algorithm of DC/DC converter for BESS in standalone DC micro grid is presented and it is proved through simulation and experiment.