• 제목/요약/키워드: fault tolerant computing

검색결과 69건 처리시간 0.026초

표준 MPI 환경에서의 무정지형 선형 시스템 해법 (A Fault-Tolerant Linear System Solver in a Standard MPI Environment)

  • 박필성
    • 인터넷정보학회논문지
    • /
    • 제6권6호
    • /
    • pp.23-34
    • /
    • 2005
  • 대규모 병렬 연산에 있어서, 계산 노드 혹은 통신 네트워크의 장애는 연산 실패로 끝나 계산자원이 낭비된다. 이를 해결하는 무정지형 MPI 라이브러리들이 제안되어 있으나 이들은 MPI 표준을 따르지 않아 이식성의 문제가 있다. 본 논문에서는 응용 프로그램의 수준에서 비동기 연산과 표준 MPI 함수만 사용하여 이식성의 문제를 해결하고 장애 복구 메커니즘을 단순화하며 수렴속도를 높이는 무정지형 선형 시스템의 해법을 제안한다.

  • PDF

Design and Cost Analysis for a Fault-Tolerant Distributed Shared Memory System

  • Jazi, AL-Harbi Fahad;kim, Kangseok;Kim, Jai-Hoon
    • 인터넷정보학회논문지
    • /
    • 제17권4호
    • /
    • pp.1-9
    • /
    • 2016
  • Algorithms implementing distributed shared memory (DSM) were developed for ensuring consistency. The performance of DSM algorithms is dependent on system and usage parameters. However, ensuring these algorithms to tolerate faults is a problem that needs to be researched. In this study, we proposed fault-tolerant scheme for DSM system and analyzed reliability and fault-tolerant overhead. Using our analysis, we can choose a proper algorithm for DSM on error prone environment.

파티션 컴퓨팅 기반의 무인기 고장 감내 관성 항법 시스템 (A Fault-tolerant Inertial Navigation System for UAVs Based on Partition Computing)

  • 정병용;김정국
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제21권1호
    • /
    • pp.29-39
    • /
    • 2015
  • 무인기 항법 시스템의 개발 및 실험에는 위험 요소가 많아 가벼운 하중을 유지하면서도 고장 감내를 지원하는 시스템이 요구된다. 본 논문에서는 CPU 시간과 메모리를 독립적으로 사용하는 파티션을 기반으로, 단일 및 복수 개의 FCC(Flight Control Computer)에서 항법용 주 및 보조 OFP(Operational Flight Program) 파티션들을 독립적으로 수행하는 고장 감내 무인기 항법 시스템에 대해 기술한다. 개발된 시스템은 이중화된 두 개의 FCC를 사용하고, 각 보드에서는 OFP 파티션을 이중화하여 개발 중인 OFP 및 검증된 OFP 시스템을 독립적으로 수행한다. 이러한 고장 감내 시스템은 감내 하중이 작은 무인기의 경우에 하나의 FCC만 사용하여도 S/W 이중화에 따른 고장 감내가 가능하며, H/W 고장 감내도 필요한 중대형 무인기의 경우, 이중화 파티션을 수행하는 보조 FCC까지 사용한다. 이와 같은 파티션 기반 고장 감내 항법 시스템은 그 개발 단계에서 실험의 많은 위험 요소를 제거할 것이다.

A Fault Tolerant Data Management Scheme for Healthcare Internet of Things in Fog Computing

  • Saeed, Waqar;Ahmad, Zulfiqar;Jehangiri, Ali Imran;Mohamed, Nader;Umar, Arif Iqbal;Ahmad, Jamil
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권1호
    • /
    • pp.35-57
    • /
    • 2021
  • Fog computing aims to provide the solution of bandwidth, network latency and energy consumption problems of cloud computing. Likewise, management of data generated by healthcare IoT devices is one of the significant applications of fog computing. Huge amount of data is being generated by healthcare IoT devices and such types of data is required to be managed efficiently, with low latency, without failure, and with minimum energy consumption and low cost. Failures of task or node can cause more latency, maximum energy consumption and high cost. Thus, a failure free, cost efficient, and energy aware management and scheduling scheme for data generated by healthcare IoT devices not only improves the performance of the system but also saves the precious lives of patients because of due to minimum latency and provision of fault tolerance. Therefore, to address all such challenges with regard to data management and fault tolerance, we have presented a Fault Tolerant Data management (FTDM) scheme for healthcare IoT in fog computing. In FTDM, the data generated by healthcare IoT devices is efficiently organized and managed through well-defined components and steps. A two way fault-tolerant mechanism i.e., task-based fault-tolerance and node-based fault-tolerance, is provided in FTDM through which failure of tasks and nodes are managed. The paper considers energy consumption, execution cost, network usage, latency, and execution time as performance evaluation parameters. The simulation results show significantly improvements which are performed using iFogSim. Further, the simulation results show that the proposed FTDM strategy reduces energy consumption 3.97%, execution cost 5.09%, network usage 25.88%, latency 44.15% and execution time 48.89% as compared with existing Greedy Knapsack Scheduling (GKS) strategy. Moreover, it is worthwhile to mention that sometimes the patients are required to be treated remotely due to non-availability of facilities or due to some infectious diseases such as COVID-19. Thus, in such circumstances, the proposed strategy is significantly efficient.

Dynamic Redundancy-based Fault-Recovery Scheme for Reliable CGRA-based Multi-Core Architecture

  • Kim, Yoonjin;Sohn, Seungyeon
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제15권6호
    • /
    • pp.615-628
    • /
    • 2015
  • CGRA (Coarse-Grained Reconfigurable Architecture) based multi-core architecture can be considered as a suitable solution for the fault-tolerant computing. However, there have been a few research projects based on fault-tolerant CGRA without exploiting the strengths of CGRA as well as their works are limited to single CGRA. Therefore, in this paper, we propose two approaches to enable exploiting the inherent redundancy and reconfigurability of the multi-CGRA for fault-recovery. One is a resilient inter-CGRA fabric that is ring-based sharing fabric (RSF) with minimal interconnection overhead. Another is a novel intra/inter-CGRA reconfiguration technique on RSF for maximizing utilization of the resources when faults occur. Experimental results show that the proposed approaches achieve up to 94% faulty recoverability with reducing area/delay/power by up to 15%/28.6%/31% when compared with completely connected fabric (CCF).

결함허용 양자컴퓨팅 시스템 기술 연구개발 동향 (Technology Trends of Fault-tolerant Quantum Computing)

  • 황용수;김태완;백충헌;조성운;김홍석;최병수
    • 전자통신동향분석
    • /
    • 제37권2호
    • /
    • pp.1-10
    • /
    • 2022
  • Similar to present computers, quantum computers comprise quantum bits (qubits) and an operating system. However, because the quantum states are fragile, we need to correct quantum errors using entangled physical qubits with quantum error correction (QEC) codes. The combination of entangled physical qubits with a QEC protocol and its computational model are called a logical qubit and fault-tolerant quantum computation, respectively. Thus, QEC is the heart of fault-tolerant quantum computing and overcomes the limitations of noisy intermediate-scale quantum computing. Therefore, in this study, we briefly survey the status of QEC codes and the physical implementation of logical qubit over various qubit technologies. In summary, we emphasize 1) the error threshold value of a quantum system depends on the configurations and 2) therefore, we cannot set only any specific theoretical and/or physical experiment suggestion.

병렬 및 분산 시스템에서의 최적 고장 허용 자원 배치 (Optimal Fault-Tolerant Resource Placement in Parallel and Distributed Systems)

  • 김종훈;이철훈
    • 한국정보과학회논문지:시스템및이론
    • /
    • 제27권6호
    • /
    • pp.608-618
    • /
    • 2000
  • 본 논문에서는 병렬 및 분산 시스템에서 자원을 배치함에 있어서 최소한의 자원 복사(copy)만을 사용하면서 임의의 노드 및 링크 상에서 고장이 발생하더라도 주어진 성능 요건을 만족하게 하는 자원의 최적 배치 방법을 모색하고자 한다. 이러한 성능 요건의 만족과 시스템의 고가용성을 위하여, 모든 노드들에 대하여 최소한의 자원 복사를 사용하여 그 노드나 혹은 인접한 노드 중 적어도 두 개 이상에 자원 복사가 존재해야 하는데, 이것을 본 논문에서는 고장 허용 자원 배치 문제라고 부른다. 병렬 및 분산 시스템은 그래프로 표현할 수가 있다. 여기에서 고장 허용 자원 배치 문제는 그래프 상에서 가장 작은 고장 허용 dominating set을 찾는 문제로 변환이 된다. Dominating set 문제는 NP-complete로 증명이 되어 있으며, 본 논문에서는 A* 알고리즘을 사용하여 상태 공간 탐색 방법으로 최적 배치를 구한다. 또한, 최적 배치를 찾는 데에 걸리는 시간을 단축시키기 위하여, 고장 허용 dominating set의 특성들을 분석하여 유용한 휴리스틱 정보들을 도출한다. 또한 여러가지 정형 그래프와 임의 그래프 상에서의 실험을 통하여, 이들 휴리스틱 정보들을 사용하여 최적 고장 허용 자원 배치를 찾는 데에 걸리는 시간을 상당히 줄일 수 있음을 보인다.

  • PDF

분산객체 기반 경량화 결함허용 기술의 성능 비교 (The Performance Comparison of Low-Overhead Fault Tolerant Services based on Distributed Object)

  • 김식;현무용
    • 정보학연구
    • /
    • 제9권4호
    • /
    • pp.25-34
    • /
    • 2006
  • As most application programs are more sophisticated and are adopted the distributed object technology, the object based distributed design became widespread since it supports portability and reusability. The approaches for fault-tolerant distributed computing are categorized into the active replica mechanism for mission-critical application programs and the passive replica mechanism for non mission-critical ones, when fault-tolerant facilities are added on. Our paper introduces the pros and drawbacks of several approaches for the add-on low-overhead fault-tolerant services by the surveys and shows the results of experiments for bench-mark models in order to demonstrate their performance.

  • PDF

Hierarchical Multiplexing Interconnection Structure for Fault-Tolerant Reconfigurable Chip Multiprocessor

  • Kim, Yoon-Jin
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제11권4호
    • /
    • pp.318-328
    • /
    • 2011
  • Stage-level reconfigurable chip multiprocessor (CMP) aims to achieve highly reliable and fault tolerant computing by using interwoven pipeline stages and on-chip interconnect for communicating with each other. The existing crossbar-switch based stage-level reconfigurable CMPs offer high reliability at the cost of significant area/power overheads. These overheads make realizing large CMPs prohibitive due to the area and power consumed by heavy interconnection networks. On other hand, area/power-efficient architectures offer less reliability and inefficient stage-level resource utilization. In this paper, I propose a hierarchical multiplexing interconnection structure in lieu of crossbar interconnect to design area/power-efficient stage-level reconfigurable CMP. The proposed approach is able to keep the reliability offered by the crossbar-switch while reducing the area and power overheads. Experimental results show that the proposed approach reduces area by up to 21% and power by up to 32% when compared with the crossbar-switch based interconnection network.