• Title/Summary/Keyword: partially observable Markov decision process

Search Result 20, Processing Time 0.034 seconds

System Replacement Policy for A Partially Observable Markov Decision Process Model

  • Kim, Chang-Eun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.16 no.2
    • /
    • pp.1-9
    • /
    • 1990
  • The control of deterioration processes for which only incomplete state information is available is examined in this study. When the deterioration is governed by a Markov process, such processes are known as Partially Observable Markov Decision Processes (POMDP) which eliminate the assumption that the state or level of deterioration of the system is known exactly. This research investigates a two state partially observable Markov chain in which only deterioration can occur and for which the only actions possible are to replace or to leave alone. The goal of this research is to develop a new jump algorithm which has the potential for solving system problems dealing with continuous state space Markov chains.

  • PDF

Partially Observable Markov Decision Processes (POMDPs) and Wireless Body Area Networks (WBAN): A Survey

  • Mohammed, Yahaya Onimisi;Baroudi, Uthman A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.5
    • /
    • pp.1036-1057
    • /
    • 2013
  • Wireless body area network (WBAN) is a promising candidate for future health monitoring system. Nevertheless, the path to mature solutions is still facing a lot of challenges that need to be overcome. Energy efficient scheduling is one of these challenges given the scarcity of available energy of biosensors and the lack of portability. Therefore, researchers from academia, industry and health sectors are working together to realize practical solutions for these challenges. The main difficulty in WBAN is the uncertainty in the state of the monitored system. Intelligent learning approaches such as a Markov Decision Process (MDP) were proposed to tackle this issue. A Markov Decision Process (MDP) is a form of Markov Chain in which the transition matrix depends on the action taken by the decision maker (agent) at each time step. The agent receives a reward, which depends on the action and the state. The goal is to find a function, called a policy, which specifies which action to take in each state, so as to maximize some utility functions (e.g., the mean or expected discounted sum) of the sequence of rewards. A partially Observable Markov Decision Processes (POMDP) is a generalization of Markov decision processes that allows for the incomplete information regarding the state of the system. In this case, the state is not visible to the agent. This has many applications in operations research and artificial intelligence. Due to incomplete knowledge of the system, this uncertainty makes formulating and solving POMDP models mathematically complex and computationally expensive. Limited progress has been made in terms of applying POMPD to real applications. In this paper, we surveyed the existing methods and algorithms for solving POMDP in the general domain and in particular in Wireless body area network (WBAN). In addition, the papers discussed recent real implementation of POMDP on practical problems of WBAN. We believe that this work will provide valuable insights for the newcomers who would like to pursue related research in the domain of WBAN.

Partially Observable Markov Decision Process with Lagged Information over Infinite Horizon

  • Jeong, Byong-Ho;Kim, Soung-Hie
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.16 no.1
    • /
    • pp.135-146
    • /
    • 1991
  • This paper shows the infinite horizon model of Partially Observable Markov Decision Process with lagged information. The lagged information is uncertain delayed observation of the process under control. Even though the optimal policy of the model exists, finding the optimal policy is very time consuming. Thus, the aim of this study is to find an .eplison.-optimal stationary policy minimizing the expected discounted total cost of the model. .EPSILON.- optimal policy is found by using a modified version of the well known policy iteration algorithm. The modification focuses to the value determination routine of the algorithm. Some properties of the approximation functions for the expected discounted cost of a stationary policy are presented. The expected discounted cost of a stationary policy is approximated based on these properties. A numerical example is also shown.

  • PDF

Optimal maintenance procedure for multi-state deteriorated system with incomplete monitoring

  • Jin, L.;Suzuki, K.
    • International Journal of Reliability and Applications
    • /
    • v.11 no.2
    • /
    • pp.69-87
    • /
    • 2010
  • The optimal replacement problem was investigated for a multi-state deteriorated system for which the true internal state cannot be observed directly except when the system breaks down completely. The internal state was assumed to be monitored incompletely by a monitor that gives information related to the true state of the system. The problem was formulated as a partially observable Markov decision process. The optimal procedure was found to be a monotone procedure with respect to stochastic increasing ordering of the state probability vectors under some assumptions. Limiting the optimal procedure to a monotone procedure would greatly reduce the tremendous amount of calculation time required to find the optimal procedure.

  • PDF

An Improved DSA Strategy based on Triple-States Reward Function (Triple-state 보상 함수를 기반으로 한 개선된 DSA 기법)

  • Ahmed, Tasmia;Gu, Jun-Rong;Jang, Sung-Jeen;Kim, Jae-Moung
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.47 no.11
    • /
    • pp.59-68
    • /
    • 2010
  • In this paper, we present a new method to complete Dynamic Spectrum Access by modifying the reward function. Partially Observable Markov Decision Process (POMDP) is an eligible algorithm to predict the upcoming spectrum opportunity. In POMDP, Reward function is the last portion and very important for prediction. However, the Reward function has only two states (Busy and Idle). When collision happens in the channel, reward function indicates busy state which is responsible for the throughput decreasing of secondary user. In this paper, we focus the difference between busy and collision state. We have proposed a new algorithm for reward function that indicates an additional state of collision which brings better communication opportunity for secondary users. Secondary users properly utilize opportunities to access Primary User channels for efficient data transmission with the help of the new reward function. We have derived mathematical belief vector of the new algorithm as well. Simulation results have corroborated the superior performance of improved reward function. The new algorithm has increased the throughput for secondary user in cognitive radio network.

Two-Dimensional POMDP-Based Opportunistic Spectrum Access in Time-Varying Environment with Fading Channels

  • Wang, Yumeng;Xu, Yuhua;Shen, Liang;Xu, Chenglong;Cheng, Yunpeng
    • Journal of Communications and Networks
    • /
    • v.16 no.2
    • /
    • pp.217-226
    • /
    • 2014
  • In this research, we study the problem of opportunistic spectrum access (OSA) in a time-varying environment with fading channels, where the channel state is characterized by both channel quality and the occupancy of primary users (PUs). First, a finite-state Markov channel model is introduced to represent a fading channel. Second, by probing channel quality and exploring the activities of PUs jointly, a two-dimensional partially observable Markov decision process framework is proposed for OSA. In addition, a greedy strategy is designed, where a secondary user selects a channel that has the best-expected data transmission rate to maximize the instantaneous reward in the current slot. Compared with the optimal strategy that considers future reward, the greedy strategy brings low complexity and relatively ideal performance. Meanwhile, the spectrum sensing error that causes the collision between a PU and a secondary user (SU) is also discussed. Furthermore, we analyze the multiuser situation in which the proposed single-user strategy is adopted by every SU compared with the previous one. By observing the simulation results, the proposed strategy attains a larger throughput than the previous works under various parameter configurations.

Machine Maintenance Policy Using Partially Observable Markov Decision Process

  • Pak, Pyoung Ki;Kim, Dong Won;Jeong, Byung Ho
    • Journal of Korean Society for Quality Management
    • /
    • v.16 no.2
    • /
    • pp.1-9
    • /
    • 1988
  • This paper considers a machine maintenance problem. The machine's condition is partially known by observing the machine's output products. This problem is formulated as an infinite horizon partially observable Markov decison process to find an optimal maintenance policy. However, even though the optimal policy of the model exists, finding the optimal policy is very time consuming. Thus, the intends of this study is to find ${\varepsilon}-optimal$ stationary policy minimizing the expected discounted total cost of the system, ${\varepsilon}-optimal$ policy is found by using a modified version of the well-known policy iteration algorithm. A numerical example is also shown.

  • PDF

A Localized Adaptive QoS Routing Scheme Using POMDP and Exploration Bonus Techniques (POMDP와 Exploration Bonus를 이용한 지역적이고 적응적인 QoS 라우팅 기법)

  • Han Jeong-Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.3B
    • /
    • pp.175-182
    • /
    • 2006
  • In this paper, we propose a Localized Adaptive QoS Routing Scheme using POMDP and Exploration Bonus Techniques. Also, this paper shows that CEA technique using expectation values can be simply POMDP problem, because performing dynamic programming to solve a POMDP is highly computationally expensive. And we use Exploration Bonus to search detour path better than current path. For this, we proposed the algorithm(SEMA) to search multiple path. Expecially, we evaluate performances of service success rate and average hop count with $\phi$ and k performance parameters, which is defined as exploration count and intervals. As result, we knew that the larger $\phi$, the better detour path search. And increasing n increased the amount of exploration.

Throughput Maximization for a Primary User with Cognitive Radio and Energy Harvesting Functions

  • Nguyen, Thanh-Tung;Koo, Insoo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.9
    • /
    • pp.3075-3093
    • /
    • 2014
  • In this paper, we consider an advanced wireless user, called primary-secondary user (PSU) who is capable of harvesting renewable energy and connecting to both the primary network and cognitive radio networks simultaneously. Recently, energy harvesting has received a great deal of attention from the research community and is a promising approach for maintaining long lifetime of users. On the other hand, the cognitive radio function allows the wireless user to access other primary networks in an opportunistic manner as secondary users in order to receive more throughput in the current time slot. Subsequently, in the paper we propose the channel access policy for a PSU with consideration of the energy harvesting, based on a Partially Observable Markov decision process (POMDP) in which the optimal action from the action set will be selected to maximize expected long-term throughput. The simulation results show that the proposed POMDP-based channel access scheme improves the throughput of PSU, but it requires more computations to make an action decision regarding channel access.

Labeling Q-Learning for Maze Problems with Partially Observable States

  • Lee, Hae-Yeon;Hiroyuki Kamaya;Kenich Abe
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.489-489
    • /
    • 2000
  • Recently, Reinforcement Learning(RL) methods have been used far teaming problems in Partially Observable Markov Decision Process(POMDP) environments. Conventional RL-methods, however, have limited applicability to POMDP To overcome the partial observability, several algorithms were proposed [5], [7]. The aim of this paper is to extend our previous algorithm for POMDP, called Labeling Q-learning(LQ-learning), which reinforces incomplete information of perception with labeling. Namely, in the LQ-learning, the agent percepts the current states by pair of observation and its label, and the agent can distinguish states, which look as same, more exactly. Labeling is carried out by a hash-like function, which we call Labeling Function(LF). Numerous labeling functions can be considered, but in this paper, we will introduce several labeling functions based on only 2 or 3 immediate past sequential observations. We introduce the basic idea of LQ-learning briefly, apply it to maze problems, simple POMDP environments, and show its availability with empirical results, look better than conventional RL algorithms.

  • PDF