• Title/Summary/Keyword: Software fault tolerance

Search Result 89, Processing Time 0.043 seconds

A Dependability Estimation of Microprocessor-based Software under Memory Faults using Stochastic Activity Network (SAN)

  • Park, Jong-Gyun;Seong, Poong-Hyun
    • Proceedings of the Korean Nuclear Society Conference
    • /
    • 1996.05b
    • /
    • pp.725-730
    • /
    • 1996
  • In this work, the software behavior under memory faults in operation phase is modeled and simulated using the stochastic activity network, generalized stochastic Petri nets. This networks permit the representation of concurrency, timeliness, fault tolerance, and degradable performance of system and provide a means for determining the stochastic behavior of a complex system. We estimate the reliability of an application software in the digitized system in nuclear power plants and show the sensitivity of the software reliability to the major physical parameters which affect the software failure in normal operation phase. We found that the effects of the hardware faults on the software failure should be considered for predicting the software dependability accurately in operation phase.

  • PDF

An Implementation of Fault Tolerant Software Distributed Shared Memory with Remote Logging (원격 로깅 기법을 이용하는 고장 허용 소프트웨어 분산공유메모리 시스템의 구현)

  • 박소연;김영재;맹승렬
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.5_6
    • /
    • pp.328-334
    • /
    • 2004
  • Recently, Software DSMs continue to improve its performance and scalability As Software DSMs become attractive on larger clusters, the focus of attention is likely to move toward improving the reliability of a system. A popular approach to tolerate failures is message logging with checkpointing, and so many log-based rollback recovery schemes have been proposed. In this work, we propose a remote logging scheme which uses the volatile memory of a remote node assigned to each node. As our remote logging does not incur frequent disk accesses during failure-free execution, its logging overhead is not significant especially over high-speed communication network. The remote logging tolerates multiple failures if the backup nodes of failed nodes are alive. It makes the reliability of DSMs grow much higher. We have designed and implemented the FT-KDSM(Fault Tolerant KAIST DSM) with the remote logging and showed the logging overhead and the recovery time.

An Adaptive Fault Tolerant and QoS-Enabled Middleware Support in Distributed Systems (분산 시스템의 적응형 내결합성 및 QoS 미들웨어 지원)

  • Cagalaban, Giovanni A.;Kim, Seok-Soo
    • Proceedings of the KAIS Fall Conference
    • /
    • 2009.12a
    • /
    • pp.461-465
    • /
    • 2009
  • Normally, a distributed computing environment is flexible in controlling complex embedded systems but their software components are becoming complex as these systems are equipped with several platforms and attached to various electronic devices, sensors, and actuators. These systems requires inter-object communication mechanisms to provide fault tolerant and QoS-enabled middleware service support in a distributed system. Generally, a middleware performs analysis of the parameters to ensure the availability and reliability of data dissemination. This paper focuses in particular to designing an application middleware for the specific scenario to improve the high availability and fault tolerance of data thus improving the QoS (Quality of Service) of a distributed system. The performance of an adaptive and highly reliable middleware can be significant based on the selection of vital parameters of the system.

  • PDF

Utility Design for Graceful Degradation in Embedded Systems (우아한 성능감퇴를 위한 임베디드 시스템의 유용도 설계)

  • Kang, Min-Koo;Park, Kie-Jin
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.2
    • /
    • pp.65-72
    • /
    • 2007
  • As embedded system has strict cost and space constraints, it is impossible to apply conventional fault-tolerant techniques directly for increasing the dependability of embedded system. In this paper, we propose software fault-tolerant mechanism which requires only minimum redundancy of system component. We define an utility metric that reflects the dependability of each embedded system component, and then measure the defined utility of each reconfiguration combinations to provide fault tolerance. The proposed utility evaluation process shows exponential complexity. However we reduce the complexity by hierachical subgrouping at the software level of each component. When some components of embedded system are tailed, reconfiguration operation changes the system state from current faulty state to pre-calculated one which has maximum utility combination.

A Fault-Recovery Agent for Distance Education on Home Network Environment (홈 네트워크 환경에서 원격 교육을 위한 결함 복구 에이전트)

  • Ko, Eung-Nam
    • Journal of Advanced Navigation Technology
    • /
    • v.11 no.4
    • /
    • pp.479-484
    • /
    • 2007
  • This paper explains the design and implementation of the FRA(Fault Recovery Agent). FRA is a system that is suitable for recovering software error for multimedia distance education based on home network environment. In terms of distributed multimedia systems, the most important catagories for quality of service are a timeless, volume, and reliability. In this paper, we discuss a method for increasing reliability through fault tolerance. This paper explains a performance analysis of an error recovery system running on distributed multimedia environment using rule-based DEVS modeling and simulation techniques. In DEVS, a system has a time base, inputs, states, outputs, and functions. The proposed method is more efficient than the other method in comparison with error ration and processing time.

  • PDF

Reliability Model for Distributed Remote Sensing Application

  • Achalakul, Tiranee;Wattanapongsakorn, Naruemon
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.293-296
    • /
    • 2002
  • This paper discusses a software reliability model for the distributed s-PCT algorithm fur remote sensing applications. The distributed algorithm is designed based on a Manager-Worker threading concept and goes further to use redundancy to achieve fault tolerance. The paper provides a status report on our progress in developing the reliability concept and applying it to create a model for the distributed s-PCT In particular, we are interested ill the algorithm performance versus reliability.

  • PDF

reliability of software with redundancy modules (중복모줄의 소프트웨어의 신뢰도)

  • Che, Gyu-Shik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.05a
    • /
    • pp.222-224
    • /
    • 2007
  • 내고장성(fault - tolerance)을 향상시키면서 소프트웨어의 신뢰도를 최적화하는 두가지 접근법으로서 수정복구 블록 스킴과 N-버전 프로그래밍 기법이 있다. 본 연구에서는 중복모줄을 가지고 소프트웨어의 신뢰도를 극대화하기 위한 하나의 알고리즘으로서 수정복구 스킴을 제안한다. 개발소프트웨어의 최적구조를 결정하는 기법을 수립하기 위해 최적화 공정 결과를 적용한다. 주어진 가용자원 범위 내에서 정규실행모줄과 테스트모줄의 신뢰도 및 확률을 계산하여 소프트웨어 시스템의 신뢰도를 극대화하기 위한 소프트웨어의 형상을 고찰한다. 간단한 예로서 한 개의 경우를 예시한다.

  • PDF

A Specification Language for Fault-Tolerance Real-time Software Design (결함허용 실시간 소프트웨어 설계를 위한 명세언어)

  • 김정술;강병욱
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 1997.11a
    • /
    • pp.383-394
    • /
    • 1997
  • 이 논문에서 우리는 결함허용 실시간 소프트웨어 설계를 위한 명세언어를 제안한다. 특히, 현재 가장 인기있는 소프트웨어 기법인 후향 오류 복구를 위한 명세언어로 N-modular redunduncy나 voted-process pairs등에도 사용 가능하다. 지금까지의 명세 언어로서는 시스템의 정상 개발 차원에서의 명세만 가능했다. 그래서 본 논문에서는 시스템의 오류시에도 복구 가능한 논리 전달을 위한 명세를 제공한다. 복잡함을 피하기 위해 객체단위로 시스템을 이끌며, 명세서 작성시 주요한 부시스템 단위로 이 방법을 적용하면 명세 기술에 따른 오버헤드를 감소시킬 수 있다.

  • PDF

PBFT Blockchain-Based OpenStack Identity Service

  • Youngjong, Kim;Sungil, Jang;Myung Ho, Kim;Jinho, Park
    • Journal of Information Processing Systems
    • /
    • v.18 no.6
    • /
    • pp.741-754
    • /
    • 2022
  • Openstack is widely used as a representative open-source infrastructure of the service (IaaS) platform. The Openstack Identity Service is a centralized approach component based on the token including the Memcached for cache, which is the in-memory key-value store. Token validation requests are concentrated on the centralized server as the number of differently encrypted tokens increases. This paper proposes the practical Byzantine fault tolerance (PBFT) blockchain-based Openstack Identity Service, which can improve the performance efficiency and reduce security vulnerabilities through a PBFT blockchain framework-based decentralized approach. The experiment conducted by using the Apache JMeter demonstrated that latency was improved by more than 33.99% and 72.57% in the PBFT blockchain-based Openstack Identity Service, compared to the Openstack Identity Service, for 500 and 1,000 differently encrypted tokens, respectively.

Robust Adaptive Fault-Tolerant Control for Robot Manipulators with Performance Degradation Due to Actuator Failures and Uncertainties (구동기 고장과 불확실성으로 인한 성능 저하를 가지는 로봇 매니퓰레이터에 대한 강인한 적응 내고장 제어)

  • 신진호;백운보
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.3
    • /
    • pp.173-181
    • /
    • 2004
  • In normal robot control systems without any actuator failures, it is assumed that actuator torque coefficients applied at each joint have normally 1's all the time. However, it is more practical that actuator torque coefficients applied at each joint are nonlinear time-varying. In other words, it has to be considered that actuators equipped at joints may fail due to hardware or software faults. In this work, actuator torque coefficients are assumed to have non-zero values at all joints. In the case of an actuator torque coefficient which has a zero value at a joint, it means the complete loss of torque on the joint. This paper doesn't deal with the case. As factors of performance degradation of robots, both actuator failures and uncertainties are considered in this paper at the same time. This paper proposes a robust adaptive fault-tolerant control scheme to maintain the required performance and achieve task completion for robot manipulators with performance degradation due to actuator failures and uncertainties. Simulation results are shown to verify the fault tolerance and robustness of the Proposed control scheme.