• 제목/요약/키워드: Fault-Tolerance Service

검색결과 59건 처리시간 0.019초

분산 멀티미디어 환경에서 결함 허용 에이전트의 설계 및 구현 (A Design and Implementation of Fault Tolerance Agent on Distributed Multimedia Environment)

  • 고응남;황대준
    • 한국정보처리학회논문지
    • /
    • 제6권10호
    • /
    • pp.2618-2629
    • /
    • 1999
  • In this paper, we describe the design and implementation of the FDRA(Fault Detection Recovery based on Agent) running on distributed multimedia environment. DOORAE is a good example for distributed multimedia and multimedia distance education system among students and teachers during lecture. It has primitive service agents. Service functions are implemented with objected oriented concept. FDRA is a multi-agent system. It has been environment, intelligent agents interact with each other, either collaboratively or non-collaboratively, to achieve their goals. The main idea is to detect an error by using polling method. This system detects an error by polling periodically the process with relation to session. And, it is to classify the type of error s automatically by using learning rules. The merit of this system is to use the same method to recovery it as it creates a session. FDRA is a system that is able to detect an error, to classify an error type, and to recover automatically a software error based on distributed multimedia environment.

  • PDF

Integrating Resilient Tier N+1 Networks with Distributed Non-Recursive Cloud Model for Cyber-Physical Applications

  • Okafor, Kennedy Chinedu;Longe, Omowunmi Mary
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권7호
    • /
    • pp.2257-2285
    • /
    • 2022
  • Cyber-physical systems (CPS) have been growing exponentially due to improved cloud-datacenter infrastructure-as-a-service (CDIaaS). Incremental expandability (scalability), Quality of Service (QoS) performance, and reliability are currently the automation focus on healthy Tier 4 CDIaaS. However, stable QoS is yet to be fully addressed in Cyber-physical data centers (CP-DCS). Also, balanced agility and flexibility for the application workloads need urgent attention. There is a need for a resilient and fault-tolerance scheme in terms of CPS routing service including Pod cluster reliability analytics that meets QoS requirements. Motivated by these concerns, our contributions are fourfold. First, a Distributed Non-Recursive Cloud Model (DNRCM) is proposed to support cyber-physical workloads for remote lab activities. Second, an efficient QoS stability model with Routh-Hurwitz criteria is established. Third, an evaluation of the CDIaaS DCN topology is validated for handling large-scale, traffic workloads. Network Function Virtualization (NFV) with Floodlight SDN controllers was adopted for the implementation of DNRCM with embedded rule-base in Open vSwitch engines. Fourth, QoS evaluation is carried out experimentally. Considering the non-recursive queuing delays with SDN isolation (logical), a lower queuing delay (19.65%) is observed. Without logical isolation, the average queuing delay is 80.34%. Without logical resource isolation, the fault tolerance yields 33.55%, while with logical isolation, it yields 66.44%. In terms of throughput, DNRCM, recursive BCube, and DCell offered 38.30%, 36.37%, and 25.53% respectively. Similarly, the DNRCM had an improved incremental scalability profile of 40.00%, while BCube and Recursive DCell had 33.33%, and 26.67% respectively. In terms of service availability, the DNRCM offered 52.10% compared with recursive BCube and DCell which yielded 34.72% and 13.18% respectively. The average delays obtained for DNRCM, recursive BCube, and DCell are 32.81%, 33.44%, and 33.75% respectively. Finally, workload utilization for DNRCM, recursive BCube, and DCell yielded 50.28%, 27.93%, and 21.79% respectively.

A Multi-objective Optimization Approach to Workflow Scheduling in Clouds Considering Fault Recovery

  • Xu, Heyang;Yang, Bo;Qi, Weiwei;Ahene, Emmanuel
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권3호
    • /
    • pp.976-995
    • /
    • 2016
  • Workflow scheduling is one of the challenging problems in cloud computing, especially when service reliability is considered. To improve cloud service reliability, fault tolerance techniques such as fault recovery can be employed. Practically, fault recovery has impact on the performance of workflow scheduling. Such impact deserves detailed research. Only few research works on workflow scheduling consider fault recovery and its impact. In this paper, we investigate the problem of workflow scheduling in clouds, considering the probability that cloud resources may fail during execution. We formulate this problem as a multi-objective optimization model. The first optimization objective is to minimize the overall completion time and the second one is to minimize the overall execution cost. Based on the proposed optimization model, we develop a heuristic-based algorithm called Min-min based time and cost tradeoff (MTCT). We perform extensive simulations with four different real world scientific workflows to verify the validity of the proposed model and evaluate the performance of our algorithm. The results show that, as expected, fault recovery has significant impact on the two performance criteria, and the proposed MTCT algorithm is useful for real life workflow scheduling when both of the two optimization objectives are considered.

PBFT Blockchain-Based OpenStack Identity Service

  • Youngjong, Kim;Sungil, Jang;Myung Ho, Kim;Jinho, Park
    • Journal of Information Processing Systems
    • /
    • 제18권6호
    • /
    • pp.741-754
    • /
    • 2022
  • Openstack is widely used as a representative open-source infrastructure of the service (IaaS) platform. The Openstack Identity Service is a centralized approach component based on the token including the Memcached for cache, which is the in-memory key-value store. Token validation requests are concentrated on the centralized server as the number of differently encrypted tokens increases. This paper proposes the practical Byzantine fault tolerance (PBFT) blockchain-based Openstack Identity Service, which can improve the performance efficiency and reduce security vulnerabilities through a PBFT blockchain framework-based decentralized approach. The experiment conducted by using the Apache JMeter demonstrated that latency was improved by more than 33.99% and 72.57% in the PBFT blockchain-based Openstack Identity Service, compared to the Openstack Identity Service, for 500 and 1,000 differently encrypted tokens, respectively.

핫 스탠바이 스페어링 기법을 이용한 고장 감내 이중화 시스템 설계 (The Inplementation of Fault-Tolerant Dual System Using the Hot-Standby Sparing Technique)

  • 신진욱;박동선
    • 한국통신학회논문지
    • /
    • 제29권10A호
    • /
    • pp.1113-1122
    • /
    • 2004
  • 분산 컴퓨팅 기술 발달과 인터넷 이용의 확산에 따라 고속의 멀티미디어 서비스에 대한 사용자의 욕구가 날로 증가하고 있다. 이에 따라 영상, 음성 등이 포함된 대용량 정보매체를 다루는 서비스가 주로 이루어지고 있으며 망 사업자들은 이러한 대용량 정보매체의 고속 전송이 가능하도록 초고속 네트워킹 설비에 끊임없이 투자하고 있다. 이와 같은 빠른 속도의 서비스뿐만 아니라 이와 동시에 만족되어야 하는 서비스의 요건은 안정성이다. 시스템 고장으로 인하여 기반 시설이 마비될 수 있는 전자 정보 시스템은 매우 높은 가용성 및 신뢰성을 가져야 한다. 이러한 고가용성과 고신뢰성을 얻기 위하여 본 논문에서는 핫 스탠바이 스페어링 기법을 이용한 고장 감내 다중화 시스템을 제안하고 구현한다. 제안된 시스템은 일반적인 단일 모듈 시스템을 다중화 하여 고장이 발생하면 유연하게 대처하도록 하고 고장 검출 버스를 적용하여 비교를 통한 고장 검출 기능이 가능하도록 하였다. 또한 제안된 구조는 단일 모듈 시스템에 버스 변환 장치를 도입하여 보다 쉽게 고장 감내 다중화 시스템을 구현할 수 있도록 하였다. 그리고 본 논문에서 제안한 하드웨어 시스템의 성능 평가를 위하여 마코프 프로세스를 이용한 모델링을 적용하여 고가용성 및 고신뢰성을 검증하였다.

원전 DCS용 제어통신망 설계 (A Design of Control Network for DCS in Nuclear Power Plant)

  • 이재민;박태림;문홍주;권욱현
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 하계종합학술대회 논문집(1)
    • /
    • pp.85-88
    • /
    • 2000
  • Distributed Control System(DCS) is one of the best solutions to implement control systems because it provides continuous observation of control process and execution of commands to induce proper operations. In this paper, a design of control network for DCS in nuclear power plant is proposed. The proposed control network on DCS has a simple architecture and deterministic property. Thus, the proposed control network offers hard real-time periodic service. It also has redundant media for the fault-tolerance. As a result, high safety and reliability required in nuclear power plant are guaranteed.

  • PDF

DOVE : 가상 계산 환경을 위한 분산 객체 시스템 (DOVE : A Distributed Object System for Virtual Computing Environment)

  • 김형도;우영제;류소현;정창성
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제6권2호
    • /
    • pp.120-134
    • /
    • 2000
  • 본 논문에서는 객체 지향 분산 가상 컴퓨팅 환경인 DOVE에 대하여 기술한다. DOVE는 독립적인 분산 객체들이 메소드 호출을 통하여 서로 상호 작용하는 분산 객체 모델을 기반으로 설계되었으며, 다수의 이기종 머신들로 구성된 분산 환경을 하나의 논리적인 단일 가상 컴퓨터로 사용자에게 제공함으로써 원격지에 있는 분산 객체들이 하나의 가상 컴퓨터에 존재하는 것처럼 사용할 수 있도록 한다. 또한, 병렬성, 이기종 환경, 객체 그룹, 단일한 네임 서비스, 그리고 오류 허용 등의 지원을 통하여 병렬 프로그램 개발을 위한 투명성 있고 사용이 용이한 프로그래밍 환경을 제공한다. 병렬성은 다양한 메소드 호출, 객체 그룹을 통한 다중 메소드 호출, 다중 쓰레드 구조 그리고 여러 동기화 구조를 사용함으로써 효과적으로 지원되며, 자동화된 데이타 변환 코드 생성, IDL 컴파일러를 통한 stub와 skeleton 객체 생성 그리고 객체 관리자를 통한 객체 라이프 관리와 네임 서비스를 통하여 이기종 간 호환성 문제를 해결하였으며 투명성 있고 사용이 용이한 프로그래밍 환경을 제공한다. 자치성 있는 분산 객체와 다중 레이어 구조 그리고 분산화된 네임 서비스와 객체 관리 구조를 사용함으로써 확장성과 보수성이 향상되었으며, 비동기방식의 사건 및 예외 처리 통한 오류 탐지 및 확인 기능을 제공한다.

  • PDF

결함허용 분산시스템의 재분배 알고리즘의 시뮬레이션과 평가 (Simulation and Evaluation of Redistribution Algorithms In Fault-Tolerant Distributed System)

  • 최병갑;이천희
    • 전자공학회논문지B
    • /
    • 제31B권8호
    • /
    • pp.1-10
    • /
    • 1994
  • In this paper load redistribution algorithm to allow fault-tolerance by redistributing the workload of n failure nodes to the remaining good nodes in distributed systems are investigated. To evaluate the efficiency of the algorithms a simulation model of algorithms is developed using SLAM II simulation language. The job arrival rate service rate failure and repair rate of nodes and communication delay time due to load migraion are used as parameters. The result of the simulation shows that the job arrival rate failure and repair rate of nodes do not affected on the relative efficiency of algorithms. If the communication delay time is greater than average job processing time algorithm B is better. Otherwise algorithm C is superior to the others.

  • PDF

확장성과 고장 감내를 위한 효율적인 부하 분산기 (Bi-active Load Balancer for enhancing of scalability and fault-tolerance of Cluster System)

  • 김영환;윤희용;추현승
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2002년도 춘계학술발표논문집 (상)
    • /
    • pp.381-384
    • /
    • 2002
  • This paper describes the motivation, design and performance of bi-active Load balancer in Linux Virtual Server. The goal of bi-active Load balancer is to provide a framework to build highly scalable, fault-tolerant services using a large cluster of commodity servers. The TCP/IP stack of Linux Kernel is extended to support three IP load balancing techniques, which can make parallel services of different kinds of server clusters to appear as a service on a single IP address. Scalability is achieved by transparently adding or removing a node in the cluster. and high availability is provided by detecting node or daemon failures and reconfiguring the system appropriately. Extensive simulation reveals that the proposed approach improves the reply rate about 20% compared to earlier design.

  • PDF

계층구조 Computer Network에서 공정제어를 위한 JOB Scheduling (JOB Scheduling for process Control in Hierarchical Computer Network)

  • 박일
    • 한국통신학회논문지
    • /
    • 제5권1호
    • /
    • pp.83-87
    • /
    • 1980
  • 階層構造 COMPUTER Network을 通한 工程制御로 Processing을 分數하여 Fault tolerance를 極大化시키며 複雜하고 多樣한 變數의 相互關係를 週期的으로 監視制御하는 分散制御 Processor JOB은 그 週期와 實行時間으로 定義할 수 있다. 모든 JOB에 대하여 Tree structure인 關係를 가진 subset들로 구성하여 이에 JOB Scheduling Algorithm을 求하여 본 결과 FCFS(Fist Come/First Service)인 Schedule 보다 Processor의 利用에 있어 Loose Time을 減少시키고 處理 可能時間 確保에 有利하였다.

  • PDF