• 제목/요약/키워드: Markov decision processes

검색결과 19건 처리시간 0.023초

Equivalent Transformations of Undiscounted Nonhomogeneous Markov Decision Processes

  • Park, Yun-Sun
    • 한국경영과학회지
    • /
    • 제17권2호
    • /
    • pp.131-144
    • /
    • 1992
  • Even though nonhomogeneous Markov Decision Processes subsume homogeneous Markov Decision Processes and are more practical in the real world, there are many results for them. In this paper we address the nonhomogeneous Markov Decision Process with objective to maximize average reward. By extending works of Ross [17] in the homogeneous case adopting the result of Bean and Smith [3] for the dicounted deterministic problem, we first transform the original problem into the discounted nonhomogeneous Markov Decision Process. Then, secondly, we transform into the discounted deterministic problem. This approach not only shows the interrelationships between various problems but also attacks the solution method of the undiscounted nohomogeneous Markov Decision Process.

  • PDF

Partially Observable Markov Decision Processes (POMDPs) and Wireless Body Area Networks (WBAN): A Survey

  • Mohammed, Yahaya Onimisi;Baroudi, Uthman A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제7권5호
    • /
    • pp.1036-1057
    • /
    • 2013
  • Wireless body area network (WBAN) is a promising candidate for future health monitoring system. Nevertheless, the path to mature solutions is still facing a lot of challenges that need to be overcome. Energy efficient scheduling is one of these challenges given the scarcity of available energy of biosensors and the lack of portability. Therefore, researchers from academia, industry and health sectors are working together to realize practical solutions for these challenges. The main difficulty in WBAN is the uncertainty in the state of the monitored system. Intelligent learning approaches such as a Markov Decision Process (MDP) were proposed to tackle this issue. A Markov Decision Process (MDP) is a form of Markov Chain in which the transition matrix depends on the action taken by the decision maker (agent) at each time step. The agent receives a reward, which depends on the action and the state. The goal is to find a function, called a policy, which specifies which action to take in each state, so as to maximize some utility functions (e.g., the mean or expected discounted sum) of the sequence of rewards. A partially Observable Markov Decision Processes (POMDP) is a generalization of Markov decision processes that allows for the incomplete information regarding the state of the system. In this case, the state is not visible to the agent. This has many applications in operations research and artificial intelligence. Due to incomplete knowledge of the system, this uncertainty makes formulating and solving POMDP models mathematically complex and computationally expensive. Limited progress has been made in terms of applying POMPD to real applications. In this paper, we surveyed the existing methods and algorithms for solving POMDP in the general domain and in particular in Wireless body area network (WBAN). In addition, the papers discussed recent real implementation of POMDP on practical problems of WBAN. We believe that this work will provide valuable insights for the newcomers who would like to pursue related research in the domain of WBAN.

System Replacement Policy for A Partially Observable Markov Decision Process Model

  • Kim, Chang-Eun
    • 대한산업공학회지
    • /
    • 제16권2호
    • /
    • pp.1-9
    • /
    • 1990
  • The control of deterioration processes for which only incomplete state information is available is examined in this study. When the deterioration is governed by a Markov process, such processes are known as Partially Observable Markov Decision Processes (POMDP) which eliminate the assumption that the state or level of deterioration of the system is known exactly. This research investigates a two state partially observable Markov chain in which only deterioration can occur and for which the only actions possible are to replace or to leave alone. The goal of this research is to develop a new jump algorithm which has the potential for solving system problems dealing with continuous state space Markov chains.

  • PDF

Localization and a Distributed Local Optimal Solution Algorithm for a Class of Multi-Agent Markov Decision Processes

  • Chang, Hyeong-Soo
    • International Journal of Control, Automation, and Systems
    • /
    • 제1권3호
    • /
    • pp.358-367
    • /
    • 2003
  • We consider discrete-time factorial Markov Decision Processes (MDPs) in multiple decision-makers environment for infinite horizon average reward criterion with a general joint reward structure but a factorial joint state transition structure. We introduce the "localization" concept that a global MDP is localized for each agent such that each agent needs to consider a local MDP defined only with its own state and action spaces. Based on that, we present a gradient-ascent like iterative distributed algorithm that converges to a local optimal solution of the global MDP. The solution is an autonomous joint policy in that each agent's decision is based on only its local state.cal state.

A MARKOV DECISION PROCESSES FORMULATION FOR THE LINEAR SEARCH PROBLEM

  • Balkhi, Z.T.;Benkherouf, L.
    • 한국경영과학회지
    • /
    • 제19권1호
    • /
    • pp.201-206
    • /
    • 1994
  • The linear search problem is concerned with finding a hiden target on the real line R. The position of the target governed by some probability distribution. It is desired to find the target in the least expected search time. This problem has been formulated as an optimization problem by a number of authors without making use of Markov Decision Process (MDP) theory. It is the aim of the paper to give a (MDP) formulation to the search problem which we feel is both natural and easy to follow.

  • PDF

POMDP와 Exploration Bonus를 이용한 지역적이고 적응적인 QoS 라우팅 기법 (A Localized Adaptive QoS Routing Scheme Using POMDP and Exploration Bonus Techniques)

  • 한정수
    • 한국통신학회논문지
    • /
    • 제31권3B호
    • /
    • pp.175-182
    • /
    • 2006
  • 본 논문에서는 Localized Aptive QoS 라우팅을 위해 POMDP(Partially Observable Markov Decision Processes)와 Exploration Bonus 기법을 사용하는 방법을 제안하였다. 또한, POMDP 문제를 해결하기 위해 Dynamic Programming을 사용하여 최적의 행동을 찾는 연산이 매우 복잡하고 어렵기 때문에 CEA(Certainty Equivalency Approximation) 기법을 통한 기댓값 사용으로 문제를 단순하였으며, Exploration Bonus 방식을 사용해 현재 경로보다 나은 경로를 탐색하고자 하였다. 이를 위해 다중 경로 탐색 알고리즘(SEMA)을 제안했다. 더욱이 탐색의 횟수와 간격을 정의하기 위해 $\phi$와 k 성능 파라미터들을 사용하여 이들을 통해 탐색의 횟수 변화를 통한 서비스 성공률과 성공 시 사용된 평균 홉 수에 대한 성능을 살펴보았다. 결과적으로 $\phi$ 값이 증가함에 따라 현재의 경로보다 더 나은 경로를 찾게 되며, k 값이 증가할수록 탐색이 증가함을 볼 수 있다.

Seamless Mobility of Heterogeneous Networks Based on Markov Decision Process

  • Preethi, G.A.;Chandrasekar, C.
    • Journal of Information Processing Systems
    • /
    • 제11권4호
    • /
    • pp.616-629
    • /
    • 2015
  • A mobile terminal will expect a number of handoffs within its call duration. In the event of a mobile call, when a mobile node moves from one cell to another, it should connect to another access point within its range. In case there is a lack of support of its own network, it must changeover to another base station. In the event of moving on to another network, quality of service parameters need to be considered. In our study we have used the Markov decision process approach for a seamless handoff as it gives the optimum results for selecting a network when compared to other multiple attribute decision making processes. We have used the network cost function for selecting the network for handoff and the connection reward function, which is based on the values of the quality of service parameters. We have also examined the constant bit rate and transmission control protocol packet delivery ratio. We used the policy iteration algorithm for determining the optimal policy. Our enhanced handoff algorithm outperforms other previous multiple attribute decision making methods.

용량 확장과 반납을 갖는 렌탈 자원 관리모델 (Rental Resource Management Model with Capacity Expansion and Return)

  • 김은갑;변진호
    • 한국경영과학회지
    • /
    • 제31권3호
    • /
    • pp.81-96
    • /
    • 2006
  • We consider a rental company that dynamically manages Its capacity level through capacity addition and return While serving customer with its own capacity, the company expands its capacity by renting items from an outside source so that it can avoid lost opportunities of rental which occur when stock is not sufficient. If stock becomes sufficiently large enough to cope with demands, the company returns expanded capacity to the outside source. Formulating the model into a Markov decision problem, we identify an optimal capacity management Policy which states when the company should expand its capacity and when it should return expanded capacity after capacity addition. Since it is intractable to analytically find the optimal capacity management policy and the optimal size of capacity expansion, we present a numerical procedure that finds these optimal values based on the value iteration method. Numerical analysis is implemented and we observe monotonic properties of the optimal performance measures by system parameters, which are meaningful in developing effective heuristic policies.

두 계층 공급사슬 모형에서 발주정책에 대한 수요 변동성 영향 (Demand Variability Impact on the Replenishment Policy in a Two-Echelon Supply Chain Model)

  • 김은갑
    • 한국경영과학회지
    • /
    • 제29권3호
    • /
    • pp.111-127
    • /
    • 2004
  • We consider a supply chain model with a make-to-order production facility and a single supplier. The model we treat here is a special case of a two-echelon inventory model. Unlike classical two-echelon systems, the demand process at the supplier is affected by production process at the production facility as well as customer order arrival process. In this paper, we address that how the demand variability impacts on the optimal replenishment policy. To this end, we incorporate Erlang and phase-type demand distributions into the model. Formulating the model as a Markov decision problem, we investigate the structure of the optimal replenishment policy. We also implement a sensitivity analysis on the optimal policy and establish its monotonicity with respect to system cost parameters.

정책 기울기 값 강화학습을 이용한 적응적인 QoS 라우팅 기법 연구 (A Study of Adaptive QoS Routing scheme using Policy-gradient Reinforcement Learning)

  • 한정수
    • 한국컴퓨터정보학회논문지
    • /
    • 제16권2호
    • /
    • pp.93-99
    • /
    • 2011
  • 본 논문에서는 강화학습(RL : Reinforcement Learning) 환경 하에서 정책 기울기 값 기법을 사용하는 적응적인 QoS 라우팅 기법을 제안하였다. 이 기법은 기존의 강화학습 환경 하에 제공하는 기법에 비해 기대 보상값의 기울기 값을 정책에 반영함으로써 빠른 네트워크 환경을 학습함으로써 보다 우수한 라우팅 성공률을 제공할 수 있는 기법이다. 이를 검증하기 위해 기존의 기법들과 비교 검증함으로써 그 우수성을 확인하였다.