• Title/Summary/Keyword: Fault-tolerance

Search Result 570, Processing Time 0.035 seconds

Determination of Optimal Checkpoint Intervals for Real-Time Tasks Using Distributed Fault Detection (분산 고장 탐지 방식을 이용한 실시간 태스크에서의 최적 체크포인터 구간 선정)

  • Kwak, Seong Woo;Yang, Jung-Min
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.3
    • /
    • pp.202-207
    • /
    • 2016
  • Checkpoint placement is an effective fault tolerance technique against transient faults in which the task is re-executed from the latest checkpoint when a fault is detected. In this paper, we propose a new checkpoint placement strategy separating data saving and fault detection processes that are performed together in conventional checkpoints. Several fault detection processes are performed in one checkpoint interval in order to decrease the latency between the occurrence and detection of faults. We address the placement method of fault detection processes to maximize the probability of successful execution of a task within the given deadline. We develop the Markov chain model for a real-time task having the proposed checkpoints, and derive the optimal fault detection and checkpoint interval.

A Study on Efficient Fault-Diagnosis for Multistage Interconnection Networks (다단 상호 연결 네트워크를 위한 효율적인 고장 진단에 관한 연구)

  • Bae, Sung-Hwan;Kim, Dae-Ik;Lee, Sang-Tae;Chon, Byoung-SIl
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.5
    • /
    • pp.73-81
    • /
    • 1996
  • In multiprocessor systems with multiple processors and memories, efficient communication between processors and memories is critical for high performance. Various types of multistage networks have been proposed. The economic feasibility and the improvements in both computing throughput and fault tolerance/diagnosis have been some of the most important factors in the development of these computer systems. In this paper, we present an efficient algorithm for the diagnosis of generalized cube interconnection networks with a fan-in/fan-out of 2. Also, using the assumed fault model present total fault diagnosis by generating suitable fault-detection and fault-location test sets for link stuck fault, switching element fault in direct/cross states, including broadcast diagnosis methods based on some basic properties or generalized cube interconnection networks. Finally, we illustrate some example.

  • PDF

A Study on Dual System for Fault Tolerance of PLC (PLC 오류를 포용하는 이중화 시스템에 관한 연구)

  • Ko, Jae-Hong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.3
    • /
    • pp.397-404
    • /
    • 2011
  • In this research, wish to suggest method to embody system that can accommodate defect of PLC and find actual propriety. Defect permission control system minimizes production damage because enables repair and checking without discontinuance and improve believability about whole system. Propose duplexing of system to embody this fault tolerant system. Therefore, composed control system that can permit defect or breakdown duplexings of various module proposing this system, and confirms to simulation and actuality kiln of defect permission control system through an application experiment, and compares for mean time between defect by estimate and defect special quality and system configuration of failure(failure) to improve believability of PLC control system together. In proposed system expression method and system mode and relation with operation mode, error discovery mode and switching tube of duplexing mode, and PLC's central processing unit of node study algorithm about master-standby conversion driving and continuous operation of 2 channels method that have 2 that is not one and deduced continuous operation method and result about defect permission in this algorithm and applies this result to actuality kiln control system and confirms continuous operation about PLC defect permission.

A Fault-Tolerant Multicasting Algorithm using Region Encoding Scheme in Multistage Interconnection Networks (다단계 상호연결망에서 영역 부호화 방식을 사용하는 고장 허용 멀티캐스팅 알고리즘)

  • Kim, Jin-Soo;Chang, Jung-Hwan
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.3
    • /
    • pp.117-124
    • /
    • 2002
  • This paper proposes a fault-tolerant multicasting algorithm employing the region encoding scheme in multistage interconnection networks (MIN's) containing multiple faulty switching elements. After classifying all switching elements into two subsets with equal sizes in MIN, the proposed algorithm can tolerate the faulty pattern where every fault is contained in the same subset. In order to send a multicast message to its destinations detouring faults, the proposed algorithm uses the recursive scheme that recirculates it through MIN, We prove that this algorithm can route any multicast message in only two passes through the faulty MIN.

A Real-Time Embedded Task Scheduler considering Fault-Tolerant (결함허용을 고려한 실시간 임베디드 태스크 스케줄러)

  • Jeon, Tae-Gun;Kim, Chang-Soo
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.7
    • /
    • pp.940-948
    • /
    • 2011
  • In this paper, we design and implement a task scheduler that considers real-time and fault tolerance in embedded system with a single processor. We propose a method how it can meet the deadlines of periodic tasks using RMS and complete the execution of aperiodic tasks by calculating surplus times from a periodic task set. And we describe a method how to recover of a transient fault task by managing backup time. We propose an important level of periodic tasks that can control the response time of periodic and aperiodic tasks. Finally, we analyse and evaluate the proposed methods by simulation.

A study on the Correlation Hazard Analysis for Signaling System Safety (안전성 확보를 위한 위험원 분석 기법간 상관관계에 대한 연구)

  • Han, Chan-Hee;Lee, Young-Soo;Ahn, Jin;Jo, Woo-Sic
    • Proceedings of the KSR Conference
    • /
    • 2007.11a
    • /
    • pp.638-645
    • /
    • 2007
  • Computers are increasingly being introduced into safety and reliability critical systems. The safe and reliable operation of these systems cannot be taken for granted. Malfunctions of these systems can have potentially catastrophic consequences and they have already been involved in serious accidents. Software fault prevention, fault tolerance, fault removal and fault forecasting are the techniques to be used, implemented and verified for embedded software in critical systems as the contributors to safety and reliability of the software. To use them when developing a software product, a relationship must be established between them and the development processes, the methods and techniques to be used to develop software, as well as with the different product architectures. Railroad signaling system software is a safety-critical embedded software with realtime and high reliability requirements. The primary purpose of the safety management is to prevent the loss of lives or physical damages arising from potential hazards in the railroad signaling system. This study provides a systematic approach to analysis of potential hazards for their management during the system life cycle to assure the identification and definition of the most appropriate hazards.

  • PDF

Fault-Tolerant Corrective Control for Non-fundamental Mode Faults in Asynchronous Sequential Machines (비동기 순차 머신의 비-기본모드에서 발생하는 고장 극복을 위한 교정 제어)

  • Yang, Jung-Min;Kwak, Seong Woo
    • Journal of IKEEE
    • /
    • v.24 no.3
    • /
    • pp.727-734
    • /
    • 2020
  • Fault tolerant corrective control for asynchronous sequential machines (ASMs) with transient faults is discussed in this paper. The considered ASM is vulnerable to a kind of faults whose manifestation may arise during transient transitions of the ASM, leading to transient faults occurring in non-fundamental mode. To overcome adverse effects caused by these faults, we present a novel corrective control scheme that can detect and tolerate transient faults in non-fundamental mode. The existence condition and design algorithm for an appropriate fault tolerant controller is addressed in the framework of corrective control theory. The applicability of the proposed control methodology is demonstrated in the FPGA experiment.

Design And Performance Evaluation of Fault-Tolerant Continuous Media Storage System Based on $PRR_gp$ ($PRR_gp$ 기반 결함허용 연속 매체 저장시스템의 설계와 성능평가)

  • O, Yu-Yeong;Kim, Seong-Su
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1290-1298
    • /
    • 2000
  • Multimedia Systems such as VOD(Video On Demand) and MOD (Multimedia On Demand) need to support continuous media operations which are randomly called by concurrent users and require that stored media be accessed in real-tim. To satisfy such a requirements, disk arrays consisting of multiple disks are generally used as storage systems. Under the real-time environments to provide users with accessing continuous media in the parallel and concurrent manner, storage systems should be able to deal with user requests independently. In this paper, we present a new fault-tolerant continuous media storage system called PADA(PRR\ulcorner bAsed Disk Array), which is based on a PRR\ulcorner (Prime Round Robin with Grouped Parties) disk placement scheme with enhanced reliability nd load-balancing. We have compared and evaluated the storage space overhead for fault-tolerance, the reliability of diks array systems, the degree of disk load0-balancing, the demanded buffer space, the maximum number of users being capable of supporting and the fault recovery overhead for PADA, RAID 5 and Declustered storage systems. According to the results, PADA is the best among them in that PADA satisfies load-balancing more effectively and servces more user in case of arbitrary-rate retrievals.

  • PDF

Robust Adaptive Fault-Tolerant Control for Robot Manipulators with Performance Degradation Due to Actuator Failures and Uncertainties (구동기 고장과 불확실성으로 인한 성능 저하를 가지는 로봇 매니퓰레이터에 대한 강인한 적응 내고장 제어)

  • 신진호;백운보
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.3
    • /
    • pp.173-181
    • /
    • 2004
  • In normal robot control systems without any actuator failures, it is assumed that actuator torque coefficients applied at each joint have normally 1's all the time. However, it is more practical that actuator torque coefficients applied at each joint are nonlinear time-varying. In other words, it has to be considered that actuators equipped at joints may fail due to hardware or software faults. In this work, actuator torque coefficients are assumed to have non-zero values at all joints. In the case of an actuator torque coefficient which has a zero value at a joint, it means the complete loss of torque on the joint. This paper doesn't deal with the case. As factors of performance degradation of robots, both actuator failures and uncertainties are considered in this paper at the same time. This paper proposes a robust adaptive fault-tolerant control scheme to maintain the required performance and achieve task completion for robot manipulators with performance degradation due to actuator failures and uncertainties. Simulation results are shown to verify the fault tolerance and robustness of the Proposed control scheme.

An Implementation of Fault-Tolerant Message Passing Interface on Parallel Computers (병렬 컴퓨터에서의 결함 허용 메시지 전달 인터페이스 구현)

  • Song, Dae-Ki;Lee, Cheol-Hoon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.6 no.3
    • /
    • pp.319-328
    • /
    • 2000
  • The Message-Passing Interface(MPI) is a standard interface for parallel programming environment, based on that application programs run on the processors of a parallel computer. Processor nodes execute processes consisting the program by passing messages to one another. During executing, however, if a fault occurs on a processor node or a process, this will result an inconsistent state, and consequently, the whole program will have to be stopped. To solve this problem, in this paper, we propose a fault-tolerant message passing interface(FT-MPI) by adding a fault manager module to MPI. The proposed FT-MPI does not need any hardware support, and each application program based on MPI can run on the FT-MPI without any modification. The proposed fault tolerance scheme uses the so-called hot-spare process duplication method, and verified by simulations that application programs run despite of any fault with less than 5% overhead on execution time.

  • PDF