• Title/Summary/Keyword: Failure Detection and Recovery

Search Result 29, Processing Time 0.025 seconds

Detection and Recovery of Failure Node in SAN-based Cluster Shared File System $SANique^{TM}$ (SAN 기반 클러스터 공유 파일 시스템 $SANique^{TM}$의 오류 노드 탐지 및 회복 기법)

  • Lee, Kyu-Woong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.12
    • /
    • pp.2609-2617
    • /
    • 2009
  • This paper describes the design overview of shared file system $SANique^{TM}$ and proposes the method for detection of failure node and recovery management algorithm. We also illustrate the characteristics and system architecture of shared file system based on SAN. In order to provide uninterrupted service, the detection and recovery methods are proposed under the all possible system failures and natural disasters. The various kinds of system failures and disasters are characterized and then the detection and recovery method are proposed in each disconnected computing node group.

A Study of FDIR S/W Design and Verification for Gyro Sensor of COMS Satellite (통신해양기상위성 자이로센서 FDIR 설계 및 검증에 관한 연구)

  • Lee, Hoon-Hee
    • Aerospace Engineering and Technology
    • /
    • v.7 no.2
    • /
    • pp.95-102
    • /
    • 2008
  • COMS Satellite is automatically able to recover from any defined failure thanks to a full redundancy. This study assesses the effects of gyro failure on the COMS mission and analyzes the mechanism of Gyro Failure Detection, Isolation and Recovery about failure detection means, isolation and recovery actions and their consequences. At last, it checks the FDIR behavior from an injected failure on COMS simulator.

  • PDF

Failure Detection of Multi-Sensor Navigation System (다중 센서 항법 시스템에서의 센서 측정 실패 감지 시스템에 관한 연구)

  • 오재석;이판묵;오준호
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 1997.04a
    • /
    • pp.51-55
    • /
    • 1997
  • This study is devote to developing navigation filter for detecting sensor failure in multi-sensor navigation system. In multi-sensor navigation system, Kalman filter is generally used to fuse data of each sensors. Sensor failure is fatal in case that the sensor is used as external measurement of Kalman filter therefore detection and recovery of sensor failure is one the important feature of navigation filter. Generally each sensors have its specific feature in measuring navigational information. Fuzzy theory is proposed to detect external sensor failure and provide valid external measurement to Kalman filter avoiding filter divergence and instability. This idea is applied to Autonomous Underwater Vehicle(AUV) which has two navigation sensor i. e self contained inertial sensor and acoustic external sensor. 2 dimensional simulation result shows acceptable failure detection and recovery

  • PDF

Concepts in COMS Failure Management System (통신해양기상위성 고장관리 시스템 개념)

  • Lee, Hoonhee;Kim, Bangyeop;Baek, MyungJin;Yang, Koonho;Chun, Yongsik
    • Journal of Aerospace System Engineering
    • /
    • v.3 no.2
    • /
    • pp.31-38
    • /
    • 2009
  • COMS On-board FDIR(Failure Detection, Isolation and Recovery) functions are implemented on the on-board software to satisfy the autonomy and failure tolerance requirements. This paper presents concepts of COMS Failure Management with hierarchical layers and addresses the characteristics of the FDIR layer from low level to high level. It is aimed at giving the reader the understanding how the COMS FDIR was designed and how works. It first recalls what are the system level applicable requirements, which are based on the COMS mission requirements. Then it describes the philosophy and structure of the FDIR and subsequently breaks it down into the several FDIR layers. It could be used as an important and useful reference of the information to design and develop an automatic FDIR mechanism in the future.

  • PDF

Large Scale Failure Adaptive Routing Protocol for Wireless Sensor Networks (무선 센서 네트워크를 위한 대규모 장애 적응적 라우팅 프로토콜)

  • Lee, Joa-Hyoung;Seon, Ju-Ho;Jung, In-Bum
    • The KIPS Transactions:PartA
    • /
    • v.16A no.1
    • /
    • pp.17-26
    • /
    • 2009
  • Large-scale wireless sensor network are expected to play an increasingly important role for the data collection in harmful area. However, the physical fragility of sensor node makes reliable routing in harmful area a challenging problem. Since several sensor nodes in harmful area could be damaged all at once, the network should have the availability to recover routing from node failures in large area. Many routing protocols take accounts of failure recovery of single node but it is very hard these protocols to recover routing from large scale failures. In this paper, we propose a routing protocol, which we refer to as LSFA, to recover network fast from failures in large area. LSFA detects the failure by counting the packet loss from parent node and in case of failure detection LSFAdecreases the routing interval to notify the failure to the neighbor nodes. Our experimental results indicate clearly that LSFA could recover large area failures fast with less packets than previous protocols.

KOMPSAT-2 Fault and Recovery Management

  • Baek, Myung-Jin;Lee, Na-Young;Keum, Jung-Hoon
    • International Journal of Aeronautical and Space Sciences
    • /
    • v.3 no.2
    • /
    • pp.31-39
    • /
    • 2002
  • In this paper, KOMPSAT-2 on-board fault and ground recovery management design is addressesed in terms of hardware and software components which provide failure detection and spacecraft safing for anomalies which threaten spacecraft survival. It also includes ground real time up-commanding operation to recover the system safely. KOMPSAT-2 spacecraft fault and recovery management is designed such that the subsequent system configuration due to system initialization is initiated and controlled by processors. This paper will show that KOMPSAT-2 has a new design feature of CPU SEU mitigation for the possible upsets in the processor CPUs as a part of on-board fault management design. Recovery management of processor switching has two different ways: gang switching and individual switching. This paper will show that the difficulties of using multiple-processor system can be managed by proper design implementation and flight operation.

Recovery Management of Split-Brain Group in Highly Available Cluster file System $\textrm{SANique}^{TM}$ (고가용성 클러스터 파일 시스템 $\textrm{SANique}^{TM}$의 분할그룹 탐지 및 회복 기법)

  • 이규웅
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.4
    • /
    • pp.505-517
    • /
    • 2004
  • This paper overviews the design details of the cluster file system $\textrm{SANique}^{TM}$ on the SAN environment. $\textrm{SANique}^{TM}$ has the capability of transferring user data from shared SAN disk to client application without control of centralized file server. We, especially, focus on the characteristics and functions of recovery manager CRM of $\textrm{SANique}^{TM}$. The process component for failure detection and its overall procedure are described. We define the split-brain problem that cannot be easily detected in cluster file systems and also propose the recovery management method based on SAN disk in order to detect and solve the split-brain situation.

  • PDF

Determination of the profit-maximizing configuration for the modular cell manufacturing system using stochastic process (실시간 고장포용 생산시스템의 적정 성능 유지를 위한 최적 설계 기법에 관한 연구)

  • Park, Seung-Kyu
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.5 no.5
    • /
    • pp.614-621
    • /
    • 1999
  • In this paper, the analytical appproaches are presented for jointly determining the profit-miximizing configuration of the fault-tolerance real time modular cell manufacturing system. The transient(time-dependent) analysis of Markovian models is firstly applied to modular cell manufacturing system from a performability viewpoint whose modeling advantage lies in its ability to express the performance that truly matters - the user's perception of it - as well as various performance measures compositely in the context of application. The modular cells are modeled with hybrid decomposition method and then availability measures such as instantaneous availability, interval availability, expected cumulative operational time are evaluated as special cases of performability. In addition to this evaluation, sensitivity analysis of the entire manufacturing system as well as each machining cell is performed, from which the time of a major repair policy and the optimal configuration among the alternative configurations of the system can be determined. Secondly, the recovery policies from the machine failures by computing the minimal number of redundant machines and also from the task failures by computing the minimum number of tasks equipped with detection schemes of task failure and reworked upon failure detection, to meet the timing requirements are optimized. Some numerical examples are presented to demonstrate the effectiveness of the work.

  • PDF

Analytical fault tolerant navigation system for an aerospace launch vehicle using sliding mode observer

  • Hasani, Mahdi;Roshanian, Jafar;Khoshnooda, A. Majid
    • Advances in aircraft and spacecraft science
    • /
    • v.4 no.1
    • /
    • pp.53-64
    • /
    • 2017
  • Aerospace Launch Vehicles (ALV) are generally designed with high reliability to operate in complete security through fault avoidance practices. However, in spite of such precaution, fault occurring is inevitable. Hence, there is a requirement for on-board fault recovery without significant degradation in the ALV performance. The present study develops an advanced fault recovery strategy to improve the reliability of an Aerospace Launch Vehicle (ALV) navigation system. The proposed strategy contains fault detection features and can reconfigure the system against common faults in the ALV navigation system. For this purpose, fault recovery system is constructed to detect and reconfigure normal navigation faults based on the sliding mode observer (SMO) theory. In the face of pitch channel sensor failure, the original gyro faults are reconstructed using SMO theory and by correcting the faulty measurement, the pitch-rate gyroscope output is constructed to provide fault tolerant navigation solution. The novel aspect of the paper is employing SMO as an online tuning of analytical fault recovery solution against unforeseen variations due to its hardware/software property. In this regard, a nonlinear model of the ALV is simulated using specific navigation failures and the results verified the feasibility of the proposed system. Simulation results and sensitivity analysis show that the proposed techniques can produce more effective estimation results than those of the previous techniques, against sensor failures.

A Study on Software Based Fault-Tolerance Techniques for Flight Control Computer (비행조종컴퓨터 소프트웨어 기반 고장허용 설계 기법 연구)

  • Yoon, Hyung-Sik;Kim, Yeon-Gyun
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.44 no.3
    • /
    • pp.256-265
    • /
    • 2016
  • Software based fault tolerance techniques are designed to allow a system to tolerate software faults in the system. Fault tolerance techniques are divided into two groups : software based fault tolerance techniques and hardware based fault tolerance techniques. We need a proper design method according to characteristics of the system. In this paper, the concepts of software based fault tolerance techniques for Dual Flight Control Computer are described. For software based fault tolerance design, we classified software failure, designed a way for failure detection and the way of recovery. Eventually the effectiveness of software based fault tolerance techniques was verified through the Software Test Environment(STE).