• Title/Summary/Keyword: Partially observable MDP

Search Result 5, Processing Time 0.036 seconds

Deep Reinforcement Learning-based Distributed Routing Algorithm for Minimizing End-to-end Delay in MANET (MANET에서 종단간 통신지연 최소화를 위한 심층 강화학습 기반 분산 라우팅 알고리즘)

  • Choi, Yeong-Jun;Seo, Ju-Sung;Hong, Jun-Pyo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1267-1270
    • /
    • 2021
  • In this paper, we propose a distributed routing algorithm for mobile ad hoc networks (MANET) where mobile devices can be utilized as relays for communication between remote source-destination nodes. The objective of the proposed algorithm is to minimize the end-to-end communication delay caused by transmission failure with deep channel fading. In each hop, the node needs to select the next relaying node by considering a tradeoff relationship between the link stability and forward link distance. Based on such feature, we formulate the problem with partially observable Markov decision process (MDP) and apply deep reinforcement learning to derive effective routing strategy for the formulated MDP. Simulation results show that the proposed algorithm outperforms other baseline schemes in terms of the average end-to-end delay.

Labeling Q-learning with SOM

  • Lee, Haeyeon;Kenichi Abe;Hiroyuki Kamaya
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2002.10a
    • /
    • pp.35.3-35
    • /
    • 2002
  • Reinforcement Learning (RL) is one of machine learning methods and an RL agent autonomously learns the action selection policy by interactions with its environment. At the beginning of RL research, it was limited to problems in environments assumed to be Markovian Decision Process (MDP). However in practical problems, the agent suffers from the incomplete perception, i.e., the agent observes the state of the environments, but these observations include incomplete information of the state. This problem is formally modeled by Partially Observable MDP (POMDP). One of the possible approaches to POMDPS is to use historical nformation to estimate states. The problem of these approaches is how t..

  • PDF

Partially Observable Markov Decision Processes (POMDPs) and Wireless Body Area Networks (WBAN): A Survey

  • Mohammed, Yahaya Onimisi;Baroudi, Uthman A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.5
    • /
    • pp.1036-1057
    • /
    • 2013
  • Wireless body area network (WBAN) is a promising candidate for future health monitoring system. Nevertheless, the path to mature solutions is still facing a lot of challenges that need to be overcome. Energy efficient scheduling is one of these challenges given the scarcity of available energy of biosensors and the lack of portability. Therefore, researchers from academia, industry and health sectors are working together to realize practical solutions for these challenges. The main difficulty in WBAN is the uncertainty in the state of the monitored system. Intelligent learning approaches such as a Markov Decision Process (MDP) were proposed to tackle this issue. A Markov Decision Process (MDP) is a form of Markov Chain in which the transition matrix depends on the action taken by the decision maker (agent) at each time step. The agent receives a reward, which depends on the action and the state. The goal is to find a function, called a policy, which specifies which action to take in each state, so as to maximize some utility functions (e.g., the mean or expected discounted sum) of the sequence of rewards. A partially Observable Markov Decision Processes (POMDP) is a generalization of Markov decision processes that allows for the incomplete information regarding the state of the system. In this case, the state is not visible to the agent. This has many applications in operations research and artificial intelligence. Due to incomplete knowledge of the system, this uncertainty makes formulating and solving POMDP models mathematically complex and computationally expensive. Limited progress has been made in terms of applying POMPD to real applications. In this paper, we surveyed the existing methods and algorithms for solving POMDP in the general domain and in particular in Wireless body area network (WBAN). In addition, the papers discussed recent real implementation of POMDP on practical problems of WBAN. We believe that this work will provide valuable insights for the newcomers who would like to pursue related research in the domain of WBAN.

A Localized Adaptive QoS Routing Scheme Using POMDP and Exploration Bonus Techniques (POMDP와 Exploration Bonus를 이용한 지역적이고 적응적인 QoS 라우팅 기법)

  • Han Jeong-Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.3B
    • /
    • pp.175-182
    • /
    • 2006
  • In this paper, we propose a Localized Adaptive QoS Routing Scheme using POMDP and Exploration Bonus Techniques. Also, this paper shows that CEA technique using expectation values can be simply POMDP problem, because performing dynamic programming to solve a POMDP is highly computationally expensive. And we use Exploration Bonus to search detour path better than current path. For this, we proposed the algorithm(SEMA) to search multiple path. Expecially, we evaluate performances of service success rate and average hop count with $\phi$ and k performance parameters, which is defined as exploration count and intervals. As result, we knew that the larger $\phi$, the better detour path search. And increasing n increased the amount of exploration.

A Study of Adaptive QoS Routing scheme using Policy-gradient Reinforcement Learning (정책 기울기 값 강화학습을 이용한 적응적인 QoS 라우팅 기법 연구)

  • Han, Jeong-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.93-99
    • /
    • 2011
  • In this paper, we propose a policy-gradient routing scheme under Reinforcement Learning that can be used adaptive QoS routing. A policy-gradient RL routing can provide fast learning of network environments as using optimal policy adapted average estimate rewards gradient values. This technique shows that fast of learning network environments results in high success rate of routing. For prove it, we simulate and compare with three different schemes.