• 제목/요약/키워드: Partially Observable Markov Decision Processes(POMDP)

검색결과 4건 처리시간 0.024초

POMDP와 Exploration Bonus를 이용한 지역적이고 적응적인 QoS 라우팅 기법 (A Localized Adaptive QoS Routing Scheme Using POMDP and Exploration Bonus Techniques)

  • 한정수
    • 한국통신학회논문지
    • /
    • 제31권3B호
    • /
    • pp.175-182
    • /
    • 2006
  • 본 논문에서는 Localized Aptive QoS 라우팅을 위해 POMDP(Partially Observable Markov Decision Processes)와 Exploration Bonus 기법을 사용하는 방법을 제안하였다. 또한, POMDP 문제를 해결하기 위해 Dynamic Programming을 사용하여 최적의 행동을 찾는 연산이 매우 복잡하고 어렵기 때문에 CEA(Certainty Equivalency Approximation) 기법을 통한 기댓값 사용으로 문제를 단순하였으며, Exploration Bonus 방식을 사용해 현재 경로보다 나은 경로를 탐색하고자 하였다. 이를 위해 다중 경로 탐색 알고리즘(SEMA)을 제안했다. 더욱이 탐색의 횟수와 간격을 정의하기 위해 $\phi$와 k 성능 파라미터들을 사용하여 이들을 통해 탐색의 횟수 변화를 통한 서비스 성공률과 성공 시 사용된 평균 홉 수에 대한 성능을 살펴보았다. 결과적으로 $\phi$ 값이 증가함에 따라 현재의 경로보다 더 나은 경로를 찾게 되며, k 값이 증가할수록 탐색이 증가함을 볼 수 있다.

Partially Observable Markov Decision Processes (POMDPs) and Wireless Body Area Networks (WBAN): A Survey

  • Mohammed, Yahaya Onimisi;Baroudi, Uthman A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제7권5호
    • /
    • pp.1036-1057
    • /
    • 2013
  • Wireless body area network (WBAN) is a promising candidate for future health monitoring system. Nevertheless, the path to mature solutions is still facing a lot of challenges that need to be overcome. Energy efficient scheduling is one of these challenges given the scarcity of available energy of biosensors and the lack of portability. Therefore, researchers from academia, industry and health sectors are working together to realize practical solutions for these challenges. The main difficulty in WBAN is the uncertainty in the state of the monitored system. Intelligent learning approaches such as a Markov Decision Process (MDP) were proposed to tackle this issue. A Markov Decision Process (MDP) is a form of Markov Chain in which the transition matrix depends on the action taken by the decision maker (agent) at each time step. The agent receives a reward, which depends on the action and the state. The goal is to find a function, called a policy, which specifies which action to take in each state, so as to maximize some utility functions (e.g., the mean or expected discounted sum) of the sequence of rewards. A partially Observable Markov Decision Processes (POMDP) is a generalization of Markov decision processes that allows for the incomplete information regarding the state of the system. In this case, the state is not visible to the agent. This has many applications in operations research and artificial intelligence. Due to incomplete knowledge of the system, this uncertainty makes formulating and solving POMDP models mathematically complex and computationally expensive. Limited progress has been made in terms of applying POMPD to real applications. In this paper, we surveyed the existing methods and algorithms for solving POMDP in the general domain and in particular in Wireless body area network (WBAN). In addition, the papers discussed recent real implementation of POMDP on practical problems of WBAN. We believe that this work will provide valuable insights for the newcomers who would like to pursue related research in the domain of WBAN.

System Replacement Policy for A Partially Observable Markov Decision Process Model

  • Kim, Chang-Eun
    • 대한산업공학회지
    • /
    • 제16권2호
    • /
    • pp.1-9
    • /
    • 1990
  • The control of deterioration processes for which only incomplete state information is available is examined in this study. When the deterioration is governed by a Markov process, such processes are known as Partially Observable Markov Decision Processes (POMDP) which eliminate the assumption that the state or level of deterioration of the system is known exactly. This research investigates a two state partially observable Markov chain in which only deterioration can occur and for which the only actions possible are to replace or to leave alone. The goal of this research is to develop a new jump algorithm which has the potential for solving system problems dealing with continuous state space Markov chains.

  • PDF

정책 기울기 값 강화학습을 이용한 적응적인 QoS 라우팅 기법 연구 (A Study of Adaptive QoS Routing scheme using Policy-gradient Reinforcement Learning)

  • 한정수
    • 한국컴퓨터정보학회논문지
    • /
    • 제16권2호
    • /
    • pp.93-99
    • /
    • 2011
  • 본 논문에서는 강화학습(RL : Reinforcement Learning) 환경 하에서 정책 기울기 값 기법을 사용하는 적응적인 QoS 라우팅 기법을 제안하였다. 이 기법은 기존의 강화학습 환경 하에 제공하는 기법에 비해 기대 보상값의 기울기 값을 정책에 반영함으로써 빠른 네트워크 환경을 학습함으로써 보다 우수한 라우팅 성공률을 제공할 수 있는 기법이다. 이를 검증하기 위해 기존의 기법들과 비교 검증함으로써 그 우수성을 확인하였다.