• Title/Summary/Keyword: Rollback Recovery

Search Result 31, Processing Time 0.025 seconds

Determination of Optimal Checkpoint Interval for Real-time Control Tasks Considering Performance Index Function (성능 함수를 고려한 실시간 제어 테스크에서의 최적 체크 포인터 구간 선정)

  • Kwak, Seong-Woo;Jung, Young-Joo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.5
    • /
    • pp.875-880
    • /
    • 2008
  • In this paper, a novel method to determine the optimal checkpoint interval for real-time control task is proposed considering its performance degradation according to tasks's execution time. The control task in this paper has a specific sampling period shorter than its deadline. Control performance is degraded as the control task execution time is prolonged across the sampling period and eventually zero when reached to the deadline. A new performance index is defined to represent the performance variation due to the extension of task execution time accompanying rollback fault recovery. The procedure to find the optimal checkpoint interval is addressed and several simulation examples are presented.

ARIES/RL: An Extension of ARIES with Re-Logging to Support Long-Duration Transactions (ARIES/RL: 장기 트랜잭션을 지원하기 위한 ARIES 확장)

  • Jeong, Jae-Mok;Lee, Kang-Woo;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.27 no.1
    • /
    • pp.129-140
    • /
    • 2000
  • We propose the ARIES/RL which extends the with 're-logging' technique to manage the limited online 1og space efficiently even though 1ong-duration transactions exist. Re-logging isa technique that log records used in transaction rollback and restart recovery are copied forward in the log whenever the online log is not sufficient for keeping logs of on-going transactions It does not hurt the advantages of ARIES. Moreover, it handles log space efficiently in executing long-duration transactions. We also present the evaluation result of ARIES/RL and show that ARIES/RL handles online log efficiently.

  • PDF

Checkpointing and Rollback-Recovery Protocols in Distributed Computing Systems (분산 계산 환경의 검사점 작성 및 롤백 복구 프로토콜)

  • 안성준;조유근
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10c
    • /
    • pp.93-95
    • /
    • 1999
  • 메시지 전달을 이용한 분산 계산 환경의 검사점 작성 및 롤백 프로토콜은 조정 검사점 작성(coordinated checkpointing), 약조정 검사점, 작성(loosely coordinated checkpointing), 독립적 검사점 작성(independent checkpointint)등 크게 세 종류로 구분할 수 있다. 이 프로토콜들의 성능은 프로세스간 통신의 빈도, 통신의 패턴 등 응용의 특성 및 수행 환경에 영향을 받는다. 기존에 제안된 프로토콜 각각의 성능에 대해서는 많은 연구가 있었으나 이질적인 종류의 프로토콜들을 동일한 환경에서 구현하여 성능을 비교하는 연구는 이루어지지 않았다. 본 논문에서는 검사점 작성 및 롤백 복구 프로토콜들을 구현하고, 동일한 환경에서 성능을 측정한 결과를 제시한다. 아울러 검사점 작성 및 롤백 복구 프로토콜의 성능에 영향을 미치는 요소들을 분석하여, 이들 프로토콜의 성능 평가 기준과 응용의 특성에 적합한 프로토콜의 선택 기준을 제시한다.

  • PDF

Concurrency Control and Recovery Method of B+-Tree using Bulk Loading and Extended Lazy Deletion (일괄구성과 확장된 지연삭제를 이용한 B+-Tree의 동시성 제어 및 회복)

  • 김대일;김성희;조숙경;배해영
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.04b
    • /
    • pp.128-130
    • /
    • 2000
  • B+-Tree는 데이트베이스 관리 시스템에서 대용량의 데이터를 관리하기 위해 가장 널리 사용되는 인덱스이다. 그런 기존의 B+-Tree는 데이터베이스의 초기 구성 및 재구성시 많은 비용이 들고, 또한 삭제 연산의 빈번한 발생시 색인 구조 변경연산의 발생빈도가 높아져 동시성이 떨어진다는 단점이 있다. 이러한 문제점을 해결하기 위해서 기존 대부분의 데이터베이스 관리시스템에서는 일괄구성과 지연삭제를 이용하고 있으나, 동시성 및 회복에 대한 처리가 미흡하여 실제 시스템에 적용하기에는 문제가 있다. 따라서 본 논문에서는 일괄구성과 지연삭제 방법을 적용한 B+-Tree에서의 동시성 및 회복기법을 제안한다. 제안된 기법은 일괄구성 시에 잠금의 부하와 연속적인 철회(Cascade Rollback)가 없고, 또한 지연 삭제기법을 확장함으로써 빈 페이지 리스트 관리에 대한 부하가 없으며, 삭제 연산에 대한 회복 시 논리적 복귀(Logical Undo)가 빨라지고 구현이 간단해진다는 장점이 있다.

  • PDF

Reducing Overhead of Distributed Checkpointing with Group Communication

  • Ahn, Jinho
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.2
    • /
    • pp.83-90
    • /
    • 2020
  • A protocol HMNR, was proposed to utilize control information of every other process piggybacked on each sent message for minimizing the number of forced checkpoints. Then, an improved protocol, called Lazy-HMNR, was presented to lower the possibility of taking forced checkpoints incurred by the asymmetry between checkpointing frequencies of processes. Despite these two different minimization techniques, if the high message interaction traffic occurs, Lazy-HMNR may considerably lower the probability of knowing whether there occurs no Z-cycle due to its shortcomings. Also, we recognize that no previous work has smart procedures to be able to utilize network infrastructures for highly decreasing the number of forced checkpoints with dependency information carried on every application message. We introduce a novel Lazy-HMNR protocol for group communication-based distributed computing systems to cut back the number of forced checkpoints in a more effective manner. Our simulation outcomes showed that the proposed protocol may highly lessen the frequency of forced checkpoints by comparison to Lazy-HMNR.

A Verification of Replicated Operation In P2P Computing (P2P 컴퓨팅에서 중복 수행 결과의 정확성 검증 기법)

  • Park, Chan Yeol
    • The Journal of Korean Association of Computer Education
    • /
    • v.7 no.3
    • /
    • pp.35-43
    • /
    • 2004
  • Internet-based P2P computing with independent machines suffers from frequent disconnections and security threats caused by leaving, failure, network diversity, or anonymity of participated machines. Replication schemes of shared resources are used for solving these issues in many studies and implementations. We propose an operational replication scheme in P2P computing to share computing resources, and the scheme verifies the correctness of operation against faults and security threats. This verifications are carried out periodically on replicated and dependent working units without global message exchanges over the whole system. The verified working units are treated as checkpoints, and thus they could be put to practical use for fault-tolerance with rollback recovery.

  • PDF

An Efficient Record-Replay Mechanism using Hardware Performance Counters and Debugging Facilities (하드웨어 성능 카운터와 디버깅 기능을 이용한 리코드-리플레이 방법)

  • Maeng, Ji-Chan;Ryu, Min-Soo
    • The KIPS Transactions:PartA
    • /
    • v.18A no.5
    • /
    • pp.177-180
    • /
    • 2011
  • In this paper, we present a record-replay technique based on interrupt logging and reproduction. Race conditions have been considered as the main source of nondeterminism in conventional record-replay approaches. However, interrupts are another source of nondeterministic computer system behavior, which must be reproduced at accurate time points, let alone the order of interrupt occurrence. We show that an interrupt-based replayer can be efficiently and effectively implemented by using hardware performance counters and debugging functionality. Experiments also show that the runtime overhead of the interrupt-based replayer is sufficiently low.

An Implementation of Fault Tolerant Software Distributed Shared Memory with Remote Logging (원격 로깅 기법을 이용하는 고장 허용 소프트웨어 분산공유메모리 시스템의 구현)

  • 박소연;김영재;맹승렬
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.5_6
    • /
    • pp.328-334
    • /
    • 2004
  • Recently, Software DSMs continue to improve its performance and scalability As Software DSMs become attractive on larger clusters, the focus of attention is likely to move toward improving the reliability of a system. A popular approach to tolerate failures is message logging with checkpointing, and so many log-based rollback recovery schemes have been proposed. In this work, we propose a remote logging scheme which uses the volatile memory of a remote node assigned to each node. As our remote logging does not incur frequent disk accesses during failure-free execution, its logging overhead is not significant especially over high-speed communication network. The remote logging tolerates multiple failures if the backup nodes of failed nodes are alive. It makes the reliability of DSMs grow much higher. We have designed and implemented the FT-KDSM(Fault Tolerant KAIST DSM) with the remote logging and showed the logging overhead and the recovery time.

Efficient Algorithms for Causal Message Logging and Revoery (인과적 메시지 로그 및 복구를 위한 효율적인 알고리즘)

  • Lee, Byeong-Ju;Park, Tae-Sun;Yeom, Heon-Yeong;Jo, Yu-Geun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.7
    • /
    • pp.767-777
    • /
    • 1999
  • 인과적 메시지 로깅 기법은 정상프로세스를 역전(roll-back)시키거나 메시지의 저장을 위해 프로세스의 수행을 중단시키지 않는 장점을 지니고 있지만, 메시지의 크기가 지나치게 커진다는 단점을 지니고 있다. 본 논문에서는 인과적 메시지 로깅 기법의 이러한 문제점을 해결하기 위하여 로그 상속의 개념을 정의하고 로그 연혁을 이용하여 로그 비용, 특히 로그 크기 면에서 효율적인 로깅 기법을 제안한다. 또한 이 로깅 알고리즘을 이용하여 복구시 메시지의 수와 크기를 줄여 복구시간을 줄이는 효율적인 복구 알고리즘을 제안하고, 제안한 알고리즘이 메시지 로그 크기 면에서 효율적임을 증명한다. 또 제안한 알고리즘의 성능을 검증하기 위하여 두 가지 종류의 모의 실험을 수행하여 기존의 로깅 프로토콜과 메시지 크기 면에서의 성능을 비교한 결과를 제시하였다.Abstract Causal message logging has many good properties such as nonblocking message logging and no rollback propagation. However, it requires a large amount of information to be piggybacked on each message, which may incur severe performance degradation. This paper presents an efficient causal logging algorithm based on the new message log structure, LogOn, which represents the causal inter-process dependency relation with much smaller overhead compared to the existing algorithms. The proposed algorithm is efficient in the sense that it entails no additional information other than LogOn to be carried in each message, while other existing algorithms require extra information other than the message logs. This paper also presents an efficient recovery algorithm to solve the problem of a large amount of data exchanges during the recovery. To verify the performance of our algorithm, we give an analysis of the algorithm and perform two simulations and compare the log size with other causal logging protocols.

IMMORTAL : Fault Tolerant Distributed Middleware System based on Remote Method Invocation (IMMORTAL : 원격 메쏘드 호출에 기반한 결함허용 분산 미들웨어 시스템)

  • Hyun, Mu-Yong;Kim, Shik;Kim, Myung-Jun;Yamakita, Jiro
    • Journal of KIISE:Information Networking
    • /
    • v.29 no.5
    • /
    • pp.562-572
    • /
    • 2002
  • Distributed object technologies have become popular in developing distributed systems. Although such middleware platforms as DSOM, DCOM, CORBA and Java RMI ease the development of distributed applications, they do not directly improve the reliability and the availability of these applications. Because the task of developing fault-tolerance techniques for distributed object paradigms is often complicated and error-prone, there is a great need for a development toolkit that enhances the reliability and the availability of distributed objects. In this paper, we propose a fault-tolerant distributed middleware system based on RMI, called IMMORTAL. We use a log-based rollback-recovery mechanism for supporting reliable distributed computing. Through a series of experiments, we observe that benchmark applications on the IMMORTAL tolerate hardware and software failures and evaluate its performance and scalability.