• Title/Summary/Keyword: Markov decision process(MDP)

Search Result 35, Processing Time 0.026 seconds

Markov Decision Process-based Potential Field Technique for UAV Planning

  • MOON, CHAEHWAN;AHN, JAEMYUNG
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.25 no.4
    • /
    • pp.149-161
    • /
    • 2021
  • This study proposes a methodology for mission/path planning of an unmanned aerial vehicle (UAV) using an artificial potential field with the Markov Decision Process (MDP). The planning problem is formulated as an MDP. A low-resolution solution of the MDP is obtained and used to define an artificial potential field, which provides a continuous UAV mission plan. A numerical case study is conducted to demonstrate the validity of the proposed technique.

The Minimum-cost Network Selection Scheme to Guarantee the Periodic Transmission Opportunity in the Multi-band Maritime Communication System (멀티밴드 해양통신망에서 전송주기를 보장하는 최소 비용의 망 선택 기법)

  • Cho, Ku-Min;Yun, Chang-Ho;Kang, Chung-G
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.2A
    • /
    • pp.139-148
    • /
    • 2011
  • This paper presents the minimum-cost network selection scheme which determines the transmission instance in the multi-band maritime communication system, so that the shipment-related real-time information can be transmitted within the maximum allowed period. The transmission instances and the corresponding network selection process are modeled by a Markov Decision Process (MDP), for the channel model in the 2-state Markov chain, which can be solved by stochastic dynamic programming. It derives the minimum-cost network selection rule, which can reduce the network cost significantly as compared with the straight-forward scheme with a periodic transmission.

An Energy-Efficient Transmission Strategy for Wireless Sensor Networks (무선 센서 네트워크에서 에너지 효율적인 전송 방안에 관한 연구)

  • Phan, Van Ca;Kim, Jeong-Geun
    • Journal of Internet Computing and Services
    • /
    • v.10 no.3
    • /
    • pp.85-94
    • /
    • 2009
  • In this work we propose an energy-efficient transmission strategy for wireless sensor networks that operate in a strict energy-constrained environment. Our transmission algorithm consists of two components: a binary-decision based transmission and a channel-aware backoff adjustment. In the binary-decision based transmission, we obtain the optimum threshold for successful transmission via Markov decision process (MDP) formulation. A channel-aware backoff adjustment, the second component of our proposal, is introduced to favor sensor nodes seeing better channel in terms of transmission priority. Extensive simulations are performed to verify the performance of our proposal over fading wireless channels.

  • PDF

Localization and a Distributed Local Optimal Solution Algorithm for a Class of Multi-Agent Markov Decision Processes

  • Chang, Hyeong-Soo
    • International Journal of Control, Automation, and Systems
    • /
    • v.1 no.3
    • /
    • pp.358-367
    • /
    • 2003
  • We consider discrete-time factorial Markov Decision Processes (MDPs) in multiple decision-makers environment for infinite horizon average reward criterion with a general joint reward structure but a factorial joint state transition structure. We introduce the "localization" concept that a global MDP is localized for each agent such that each agent needs to consider a local MDP defined only with its own state and action spaces. Based on that, we present a gradient-ascent like iterative distributed algorithm that converges to a local optimal solution of the global MDP. The solution is an autonomous joint policy in that each agent's decision is based on only its local state.cal state.

A MARKOV DECISION PROCESSES FORMULATION FOR THE LINEAR SEARCH PROBLEM

  • Balkhi, Z.T.;Benkherouf, L.
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.19 no.1
    • /
    • pp.201-206
    • /
    • 1994
  • The linear search problem is concerned with finding a hiden target on the real line R. The position of the target governed by some probability distribution. It is desired to find the target in the least expected search time. This problem has been formulated as an optimization problem by a number of authors without making use of Markov Decision Process (MDP) theory. It is the aim of the paper to give a (MDP) formulation to the search problem which we feel is both natural and easy to follow.

  • PDF

Partially Observable Markov Decision Processes (POMDPs) and Wireless Body Area Networks (WBAN): A Survey

  • Mohammed, Yahaya Onimisi;Baroudi, Uthman A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.5
    • /
    • pp.1036-1057
    • /
    • 2013
  • Wireless body area network (WBAN) is a promising candidate for future health monitoring system. Nevertheless, the path to mature solutions is still facing a lot of challenges that need to be overcome. Energy efficient scheduling is one of these challenges given the scarcity of available energy of biosensors and the lack of portability. Therefore, researchers from academia, industry and health sectors are working together to realize practical solutions for these challenges. The main difficulty in WBAN is the uncertainty in the state of the monitored system. Intelligent learning approaches such as a Markov Decision Process (MDP) were proposed to tackle this issue. A Markov Decision Process (MDP) is a form of Markov Chain in which the transition matrix depends on the action taken by the decision maker (agent) at each time step. The agent receives a reward, which depends on the action and the state. The goal is to find a function, called a policy, which specifies which action to take in each state, so as to maximize some utility functions (e.g., the mean or expected discounted sum) of the sequence of rewards. A partially Observable Markov Decision Processes (POMDP) is a generalization of Markov decision processes that allows for the incomplete information regarding the state of the system. In this case, the state is not visible to the agent. This has many applications in operations research and artificial intelligence. Due to incomplete knowledge of the system, this uncertainty makes formulating and solving POMDP models mathematically complex and computationally expensive. Limited progress has been made in terms of applying POMPD to real applications. In this paper, we surveyed the existing methods and algorithms for solving POMDP in the general domain and in particular in Wireless body area network (WBAN). In addition, the papers discussed recent real implementation of POMDP on practical problems of WBAN. We believe that this work will provide valuable insights for the newcomers who would like to pursue related research in the domain of WBAN.

A Markov Decision Process (MDP) based Load Balancing Algorithm for Multi-cell Networks with Multi-carriers

  • Yang, Janghoon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.10
    • /
    • pp.3394-3408
    • /
    • 2014
  • Conventional mobile state (MS) and base station (BS) association based on average signal strength often results in imbalance of cell load which may require more powerful processor at BSs and degrades the perceived transmission rate of MSs. To deal with this problem, a Markov decision process (MDP) for load balancing in a multi-cell system with multi-carriers is formulated. To solve the problem, exploiting Sarsa algorithm of on-line learning type [12], ${\alpha}$-controllable load balancing algorithm is proposed. It is designed to control tradeoff between the cell load deviation of BSs and the perceived transmission rates of MSs. We also propose an ${\varepsilon}$-differential soft greedy policy for on-line learning which is proven to be asymptotically convergent to the optimal greedy policy under some condition. Simulation results verify that the ${\alpha}$-controllable load balancing algorithm controls the behavior of the algorithm depending on the choice of ${\alpha}$. It is shown to be very efficient in balancing cell loads of BSs with low ${\alpha}$.

A Simulation Sample Accumulation Method for Efficient Simulation-based Policy Improvement in Markov Decision Process (마르코프 결정 과정에서 시뮬레이션 기반 정책 개선의 효율성 향상을 위한 시뮬레이션 샘플 누적 방법 연구)

  • Huang, Xi-Lang;Choi, Seon Han
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.7
    • /
    • pp.830-839
    • /
    • 2020
  • As a popular mathematical framework for modeling decision making, Markov decision process (MDP) has been widely used to solve problem in many engineering fields. MDP consists of a set of discrete states, a finite set of actions, and rewards received after reaching a new state by taking action from the previous state. The objective of MDP is to find an optimal policy, that is, to find the best action to be taken in each state to maximize the expected discounted reward of policy (EDR). In practice, MDP is typically unknown, so simulation-based policy improvement (SBPI), which improves a given base policy sequentially by selecting the best action in each state depending on rewards observed via simulation, can be a practical way to find the optimal policy. However, the efficiency of SBPI is still a concern since many simulation samples are required to precisely estimate EDR for each action in each state. In this paper, we propose a method to select the best action accurately in each state using a small number of simulation samples, thereby improving the efficiency of SBPI. The proposed method accumulates the simulation samples observed in the previous states, so it is possible to precisely estimate EDR even with a small number of samples in the current state. The results of comparative experiments on the existing method demonstrate that the proposed method can improve the efficiency of SBPI.

Markov Decision Process for Curling Strategies (MDP에 의한 컬링 전략 선정)

  • Bae, Kiwook;Park, Dong Hyun;Kim, Dong Hyun;Shin, Hayong
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.1
    • /
    • pp.65-72
    • /
    • 2016
  • Curling is compared to the Chess because of variety and importance of strategies. For winning the Curling game, selecting optimal strategies at decision making points are important. However, there is lack of research on optimal strategies for Curling. 'Aggressive' and 'Conservative' strategies are common strategies of Curling; nevertheless, even those two strategies have never been studied before. In this study, Markov Decision Process would be applied for Curling strategy analysis. Those two strategies are defined as actions of Markov Decision Process. By solving the model, the optimal strategy could be found at any in-game states.

Earthwork Planning via Reinforcement Learning with Heterogeneous Construction Equipment (강화학습을 이용한 이종 장비 토목 공정 계획)

  • Ji, Min-Gi;Park, Jun-Keon;Kim, Do-Hyeong;Jung, Yo-Han;Park, Jin-Kyoo;Moon, Il-Chul
    • Journal of the Korea Society for Simulation
    • /
    • v.27 no.1
    • /
    • pp.1-13
    • /
    • 2018
  • Earthwork planning is one of the critical issues in a construction process management. For the construction process management, there are some different approaches such as optimizing construction with either mathematical methodologies or heuristics with simulations. This paper propose a simulated earthwork scenario and an optimal path for the simulation using a reinforcement learning. For reinforcement learning, we use two different Markov decision process, or MDP, formulations with interacting excavator agent and truck agent, sequenced learning, and independent learning. The simulation result shows that two different formulations can reach the optimal planning for a simulated earthwork scenario. This planning could be a basis for an automatic construction management.