Search | Korea Science

Checkpointing and Rollback-Recovery Protocols in Distributed Computing Systems (분산 계산 환경의 검사점 작성 및 롤백 복구 프로토콜)

안성준;조유근
- Proceedings of the Korean Information Science Society Conference
- /
- 1999.10c
- /
- pp.93-95
- /
- 1999
메시지 전달을 이용한 분산 계산 환경의 검사점 작성 및 롤백 프로토콜은 조정 검사점 작성(coordinated checkpointing), 약조정 검사점, 작성(loosely coordinated checkpointing), 독립적 검사점 작성(independent checkpointint)등 크게 세 종류로 구분할 수 있다. 이 프로토콜들의 성능은 프로세스간 통신의 빈도, 통신의 패턴 등 응용의 특성 및 수행 환경에 영향을 받는다. 기존에 제안된 프로토콜 각각의 성능에 대해서는 많은 연구가 있었으나 이질적인 종류의 프로토콜들을 동일한 환경에서 구현하여 성능을 비교하는 연구는 이루어지지 않았다. 본 논문에서는 검사점 작성 및 롤백 복구 프로토콜들을 구현하고, 동일한 환경에서 성능을 측정한 결과를 제시한다. 아울러 검사점 작성 및 롤백 복구 프로토콜의 성능에 영향을 미치는 요소들을 분석하여, 이들 프로토콜의 성능 평가 기준과 응용의 특성에 적합한 프로토콜의 선택 기준을 제시한다.
PDF

Adaptive Checkpointing Protocol for Improving of Fault Tolerance in Distributed System (분산 시스템에서 고장 감내성의 향상을 위한 적응형 체크포인팅 프로토콜)

이용호;장태무
- Proceedings of the Korean Information Science Society Conference
- /
- 1999.10c
- /
- pp.90-92
- /
- 1999
비동기 체크포인팅 프로토콜은 분산 시스템에서 고장 감내성을 제공하기 위한 방법중 하나다. 이 방법은 모든 프로세스가 독립적으로 자신의 지역 체크포인트를 두고 어느 한 프로세스에서의 고장 발생시 가장 최근의 체크포인트에서부터 롤백을 하는 것이다. 하지만 이 방법은 어느 한 프로세스에서의 고장 발생이 다른 프로세스의 롤백까지 유도하는 캐스캐이드 롤백을 발생시킬 수 있는 단점이 있다. 본 논문에서는 고장 감내성의 수준을 높이기 위하여 비동기 체크포인팅 프로토콜을 사용하면서도 캐스캐이드 롤백을 막을 수 있는 적응형 체크포인팅 프로토콜을 사용한다. 프로세스사이에 오고가는 모든 메시지의 복사본이 서버쪽의 중재자를 통하여 서버에 있는 기계 상태 테이블에 저장된다. 이렇게 하여 서버에는 무든 지역 기계의 상태가 저장되어 기계 고장이 발생했을 경우에 고장이 발생한 기계의 복구에 사용된다.
PDF

Data Consistency-Control Scheme Using a Rollback-Recovery Mechanism for Storage Class Memory (스토리지 클래스 메모리를 위한 롤백-복구 방식의 데이터 일관성 유지 기법)

Lee, Hyun Ku;Kim, Junghoon;Kang, Dong Hyun;Eom, Young Ik
- Journal of KIISE
- /
- v.42 no.1
- /
- pp.7-14
- /
- 2015
Storage Class Memory(SCM) has been considered as a next-generation storage device because it has positive advantages to be used both as a memory and storage. However, there are significant problems of data consistency in recently proposed file systems for SCM such as insufficient data consistency or excessive data consistency-control overhead. This paper proposes a novel data consistency-control scheme, which changes the write mode for log data depending on the modified data ratio in a block, using a rollback-recovery scheme instead of the Write Ahead Logging (WAL) scheme. The proposed scheme reduces the log data size and the synchronization cost for data consistency. In order to evaluate the proposed scheme, we implemented our scheme on a Linux 3.10.2-based system and measured its performance. The experimental results show that our scheme enhances the write throughput by 9 times on average when compared to the legacy data consistency control scheme.
https://doi.org/10.5626/JOK.2015.42.1.7 인용 KSCI

Design for Deep Learning Configuration Management System using Block Chain (딥러닝 형상관리를 위한 블록체인 시스템 설계)

Bae, Su-Hwan;Shin, Yong-Tae
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.14 no.3
- /
- pp.201-207
- /
- 2021
Deep learning, a type of machine learning, performs learning while changing the weights as it progresses through each learning process. Tensor Flow and Keras provide the results of the end of the learning in graph form. Thus, If an error occurs, the result must be discarded. Consequently, existing technologies provide a function to roll back learning results, but the rollback function is limited to results up to five times. Moreover, they applied the concept of MLOps to track the deep learning process, but no rollback capability is provided. In this paper, we construct a system that manages the intermediate value of the learning process by blockchain to record the intermediate learning process and can rollback in the event of an error. To perform the functions of blockchain, the deep learning process and the rollback of learning results are designed to work by writing Smart Contracts. Performance evaluation shows that, when evaluating the rollback function of the existing deep learning method, the proposed method has a 100% recovery rate, compared to the existing technique, which reduces the recovery rate after 6 times, down to 10% when 50 times. In addition, when using Smart Contract in Ethereum blockchain, it is confirmed that 1.57 million won is continuously consumed per block creation.
https://doi.org/10.17661/jkiiect.2021.14.3.201 인용 PDF KSCI

Mobile Agent based Checkpointing Coordination Scheme (이동 에이전트 기반의 검사점 조정 기법)

Park, Taesoon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2013.11a
- /
- pp.57-60
- /
- 2013
분산 컴퓨팅에 참여하는 프로세스들의 일관성 있는 실행 상태를 저장하여, 특정 시스템 사이트의 결함 발생 시 프로세스들을 일관성 있는 상태에서 복구 시키는 방법을 검사점 설정을 이용한 롤백 복구 기법이라고 한다. 이러한 복구를 위해서는 일관된 검사점 설정이 중요하며, 일관된 복구를 위한 검사점 조정 기법 중 하나가 약조정 기법이다. 본 논문에서는 약조정 기법의 문제점 중 하나인 검사점 저장 공간 문제를 해결하기 위해, 검사점 저장 공간을 안정된 저장 공간과 임시 저장 공간으로 나누고, 이동 에이전트를 이용해 불필요한 검사점을 찾아내서 주기적으로 삭제하여 효율적으로 저장 공간을 관리하는 방법을 제안한다.
https://doi.org/10.3745/PKIPS.y2013m11a.57 인용 PDF

A Dynamic Checkpoint Scheduling Scheme for Fault Tolerant Distributed Computing Systems (결함 내성 분산 시스템에서의 동적 검사점 스케쥴링 기법)

Park, Tae-Soon
- Journal of KIISE:Computer Systems and Theory
- /
- v.29 no.2
- /
- pp.75-86
- /
- 2002
The selection of the optimal checkpointing interval has been a very critical issue to implement a checkpointing recovery scheme for the fault tolerant distributed system. This paper presents a new scheme that allows a process to select the proper checkpointing interval dynamically. A process in the system evaluates the cost of checkpointing and possible rollback for each checkpointing interval and selects the proper time interval for the next checkpointing Unlike the other scheme, the overhead incurred by both of the checkpointing and rollback activities are considered for the cost evaluation and current communication pattern is reflected in the selection of the checkpointing interval. Moreover, the proposed scheme requires no extra message communication for the checkpointing interval selection and can easily be incorporated into the existing checkpointing coordination schemes.
PDF KSCI

Design and Implementation of EJB 2.1 Timer Service (EJB 2.1 타이머 서비스 설계 및 구현)

정숭욱;이경호;김중배
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.10c
- /
- pp.247-249
- /
- 2003
EJB(Enterprise Java Beans)는 웹 응용 서버 스펙인 J2EE(Java2 Enterprise Edition)의 핵심으로서, 비즈니스 업무를 웹 환경에서 컴포넌트 형태로 작성하여 재 사용성을 높이기 위한 서버 측 컴포넌트 프로그래밍 모델이다. EJB 2.1에서는 기존 EJB 2.0에 기술된 기능 이외에 웹 서비스, 타이머 서비스, EJB QL 업그레이드 등의 기능을 추가하였다. 타이머 서비스는 지정된 시간마다 EJB 빈의 특정 함수를 호출하는 기능이다. 또한, 타이머 서비스는 트랜잭션과 연관된 경우 해당 트랜잭션 컨텍스트(context) 내에서 타이머의 롤백(rollback)을 지원해야 하며, 시스템의 고장 후 재시작 시에 기존 타이머의 복구 기능을 지원해야 한다. 본 논문에서는 EJB 스펙 2.1에서 제시한 타이머 서비스의 요구 사항에 대해 알아보고, ETRI 에서 개발한 E504 EJB 서버에서 타이머 서비스를 구현한 방법에 대해 논의한다.
PDF

Design and Implementation of Reliable Distributed Programming Environment based on HORB (HORB에 기반한 신뢰성 있는 분산 프로그래밍 환경의 설계 및 구현)

Hyun, Mu-Yong;Kim, Shik;Kim, Myung-Jun
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.39 no.2
- /
- pp.1-9
- /
- 2002
The use of Object-Oriented Distributed Programming(OODP) environment such as DCOM, DSOM, Java RMI, CORBA to implement distributed applications is becoming increasingly popular. However, absence of a fault-tolerance feature in these middleware platforms complicates the design and implementation of reliable distributed object-based applications, although they greatly enhance the quality and reusability of the distributed object-based applications. In this paper, we propose a fault-tolerant programming environment based on RMI, namely Evergreen, for the reliable distributed computing with checkpoints and rollback-recovery mechanism. Based on a series of experiments, we evaluate the performance of Evergreen and find its possibility of extension to fully support our optimal design goal.
PDF KSCI

A Time-Redundant Recovery Scheme of TMR failures Using Retry and Rollback Techniques (재실행과 Rollback 기법을 사용한 TMR 고장의 시간여분 복구 기법)

Kang, Myung-Seok;Son, Byoung-Hee;Kim, Hag-Bae
- The KIPS Transactions:PartA
- /
- v.13A no.5 s.102
- /
- pp.421-428
- /
- 2006
This paper proposes an integrated recovery approach applying retry and rollback techniques to recover the TMR failure. Combining the time redundancy techniques with W system is apparently effective to recover the TMR failure(or masked error) primarily caused by transient faults. These policies need fewer reconfigurations at the cost of extra time required for the time redundant schemes. The optimal numbers of retry and rollback to minimize the mean execution time of tasks are derived for the proposed method through computing the likelihoods of all possible states of the failed system. The effectiveness of the proposed method is validated through examining certain numerical examples and simulations conducted with a variety of parameters governing environmental characteristics.
https://doi.org/10.3745/KIPSTA.2006.13A.5.421 인용 PDF KSCI

An Efficient Record-Replay Mechanism using Hardware Performance Counters and Debugging Facilities (하드웨어 성능 카운터와 디버깅 기능을 이용한 리코드-리플레이 방법)

Maeng, Ji-Chan;Ryu, Min-Soo
- The KIPS Transactions:PartA
- /
- v.18A no.5
- /
- pp.177-180
- /
- 2011
In this paper, we present a record-replay technique based on interrupt logging and reproduction. Race conditions have been considered as the main source of nondeterminism in conventional record-replay approaches. However, interrupts are another source of nondeterministic computer system behavior, which must be reproduced at accurate time points, let alone the order of interrupt occurrence. We show that an interrupt-based replayer can be efficiently and effectively implemented by using hardware performance counters and debugging functionality. Experiments also show that the runtime overhead of the interrupt-based replayer is sufficiently low.
https://doi.org/10.3745/KIPSTA.2011.18A.5.177 인용 PDF KSCI

Search Result 11, Processing Time 0.033 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)