• 제목/요약/키워드: Markov Decision Process(MDP)

검색결과 37건 처리시간 0.022초

Markov Decision Process-based Potential Field Technique for UAV Planning

  • MOON, CHAEHWAN;AHN, JAEMYUNG
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • 제25권4호
    • /
    • pp.149-161
    • /
    • 2021
  • This study proposes a methodology for mission/path planning of an unmanned aerial vehicle (UAV) using an artificial potential field with the Markov Decision Process (MDP). The planning problem is formulated as an MDP. A low-resolution solution of the MDP is obtained and used to define an artificial potential field, which provides a continuous UAV mission plan. A numerical case study is conducted to demonstrate the validity of the proposed technique.

멀티밴드 해양통신망에서 전송주기를 보장하는 최소 비용의 망 선택 기법 (The Minimum-cost Network Selection Scheme to Guarantee the Periodic Transmission Opportunity in the Multi-band Maritime Communication System)

  • 조구민;윤창호;강충구
    • 한국통신학회논문지
    • /
    • 제36권2A호
    • /
    • pp.139-148
    • /
    • 2011
  • 본 논문은 멀티밴드 해양통신망에서 선적 정보를 주기적으로 전송할 때 발생하는 비용을 최소화하기 위해 가용한 네트워크의 전송 비용과 주어진 허용 가능한 최대 지연 범위 이내에서 예상되는 최소 평균 전송 비용을 비교하여 전송 시점을 결정하는 방안을 제시한다. 이때 전송 시점과 해당 네트워크의 선택 과정을 Markov Decision Process (MDP)로 모델링하며, 이에 따라 각 밴드에서의 채널 상태를 2-State Markov Chain으로 모델링하고 평균 전송 비용을 Stochastic Dynamic Programming을 통해 계산한다. 이를 통해 최소 비용의 망 선택 방식이 도출되었으며, 제안된 방식을 사용할 때 고정 주기를 사용하여 정보를 전송하는 방식에 비해 상당한 망 사용 비용을 절감할 수 있음을 컴퓨터 시뮬레이션을 통해 보인다.

무선 센서 네트워크에서 에너지 효율적인 전송 방안에 관한 연구 (An Energy-Efficient Transmission Strategy for Wireless Sensor Networks)

  • 판반카;김정근
    • 인터넷정보학회논문지
    • /
    • 제10권3호
    • /
    • pp.85-94
    • /
    • 2009
  • 본 논문에서는 무선 센서 네트워크에서의 에너지 효율적인 전송방안을 제안하고 이에 대한 이론적 분석을 제시하고자 한다. 본 논문에서 제안하는 전송기법은 채널 상태가 상대적으로 좋을 때만 전송을 시도하는 opportunistic transmission에 기반한 이진 결정 (binary-decision) 기반 전송이다. 이진 결정 기반 전송에서는 Markov decision process (MDP)를 이용하여 성공적인 전송을 위한 최적의 채널 임계값을 도출하였다. 다양한 시뮬레이션을 통해 제안하는 전송기법의 성능을 에너지 효율성과 전송율 측면에서 분석하였다.

  • PDF

Localization and a Distributed Local Optimal Solution Algorithm for a Class of Multi-Agent Markov Decision Processes

  • Chang, Hyeong-Soo
    • International Journal of Control, Automation, and Systems
    • /
    • 제1권3호
    • /
    • pp.358-367
    • /
    • 2003
  • We consider discrete-time factorial Markov Decision Processes (MDPs) in multiple decision-makers environment for infinite horizon average reward criterion with a general joint reward structure but a factorial joint state transition structure. We introduce the "localization" concept that a global MDP is localized for each agent such that each agent needs to consider a local MDP defined only with its own state and action spaces. Based on that, we present a gradient-ascent like iterative distributed algorithm that converges to a local optimal solution of the global MDP. The solution is an autonomous joint policy in that each agent's decision is based on only its local state.cal state.

A MARKOV DECISION PROCESSES FORMULATION FOR THE LINEAR SEARCH PROBLEM

  • Balkhi, Z.T.;Benkherouf, L.
    • 한국경영과학회지
    • /
    • 제19권1호
    • /
    • pp.201-206
    • /
    • 1994
  • The linear search problem is concerned with finding a hiden target on the real line R. The position of the target governed by some probability distribution. It is desired to find the target in the least expected search time. This problem has been formulated as an optimization problem by a number of authors without making use of Markov Decision Process (MDP) theory. It is the aim of the paper to give a (MDP) formulation to the search problem which we feel is both natural and easy to follow.

  • PDF

Partially Observable Markov Decision Processes (POMDPs) and Wireless Body Area Networks (WBAN): A Survey

  • Mohammed, Yahaya Onimisi;Baroudi, Uthman A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제7권5호
    • /
    • pp.1036-1057
    • /
    • 2013
  • Wireless body area network (WBAN) is a promising candidate for future health monitoring system. Nevertheless, the path to mature solutions is still facing a lot of challenges that need to be overcome. Energy efficient scheduling is one of these challenges given the scarcity of available energy of biosensors and the lack of portability. Therefore, researchers from academia, industry and health sectors are working together to realize practical solutions for these challenges. The main difficulty in WBAN is the uncertainty in the state of the monitored system. Intelligent learning approaches such as a Markov Decision Process (MDP) were proposed to tackle this issue. A Markov Decision Process (MDP) is a form of Markov Chain in which the transition matrix depends on the action taken by the decision maker (agent) at each time step. The agent receives a reward, which depends on the action and the state. The goal is to find a function, called a policy, which specifies which action to take in each state, so as to maximize some utility functions (e.g., the mean or expected discounted sum) of the sequence of rewards. A partially Observable Markov Decision Processes (POMDP) is a generalization of Markov decision processes that allows for the incomplete information regarding the state of the system. In this case, the state is not visible to the agent. This has many applications in operations research and artificial intelligence. Due to incomplete knowledge of the system, this uncertainty makes formulating and solving POMDP models mathematically complex and computationally expensive. Limited progress has been made in terms of applying POMPD to real applications. In this paper, we surveyed the existing methods and algorithms for solving POMDP in the general domain and in particular in Wireless body area network (WBAN). In addition, the papers discussed recent real implementation of POMDP on practical problems of WBAN. We believe that this work will provide valuable insights for the newcomers who would like to pursue related research in the domain of WBAN.

A Markov Decision Process (MDP) based Load Balancing Algorithm for Multi-cell Networks with Multi-carriers

  • Yang, Janghoon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제8권10호
    • /
    • pp.3394-3408
    • /
    • 2014
  • Conventional mobile state (MS) and base station (BS) association based on average signal strength often results in imbalance of cell load which may require more powerful processor at BSs and degrades the perceived transmission rate of MSs. To deal with this problem, a Markov decision process (MDP) for load balancing in a multi-cell system with multi-carriers is formulated. To solve the problem, exploiting Sarsa algorithm of on-line learning type [12], ${\alpha}$-controllable load balancing algorithm is proposed. It is designed to control tradeoff between the cell load deviation of BSs and the perceived transmission rates of MSs. We also propose an ${\varepsilon}$-differential soft greedy policy for on-line learning which is proven to be asymptotically convergent to the optimal greedy policy under some condition. Simulation results verify that the ${\alpha}$-controllable load balancing algorithm controls the behavior of the algorithm depending on the choice of ${\alpha}$. It is shown to be very efficient in balancing cell loads of BSs with low ${\alpha}$.

마르코프 결정 과정에서 시뮬레이션 기반 정책 개선의 효율성 향상을 위한 시뮬레이션 샘플 누적 방법 연구 (A Simulation Sample Accumulation Method for Efficient Simulation-based Policy Improvement in Markov Decision Process)

  • 황시랑;최선한
    • 한국멀티미디어학회논문지
    • /
    • 제23권7호
    • /
    • pp.830-839
    • /
    • 2020
  • As a popular mathematical framework for modeling decision making, Markov decision process (MDP) has been widely used to solve problem in many engineering fields. MDP consists of a set of discrete states, a finite set of actions, and rewards received after reaching a new state by taking action from the previous state. The objective of MDP is to find an optimal policy, that is, to find the best action to be taken in each state to maximize the expected discounted reward of policy (EDR). In practice, MDP is typically unknown, so simulation-based policy improvement (SBPI), which improves a given base policy sequentially by selecting the best action in each state depending on rewards observed via simulation, can be a practical way to find the optimal policy. However, the efficiency of SBPI is still a concern since many simulation samples are required to precisely estimate EDR for each action in each state. In this paper, we propose a method to select the best action accurately in each state using a small number of simulation samples, thereby improving the efficiency of SBPI. The proposed method accumulates the simulation samples observed in the previous states, so it is possible to precisely estimate EDR even with a small number of samples in the current state. The results of comparative experiments on the existing method demonstrate that the proposed method can improve the efficiency of SBPI.

MDP에 의한 컬링 전략 선정 (Markov Decision Process for Curling Strategies)

  • 배기욱;박동현;김동현;신하용
    • 대한산업공학회지
    • /
    • 제42권1호
    • /
    • pp.65-72
    • /
    • 2016
  • Curling is compared to the Chess because of variety and importance of strategies. For winning the Curling game, selecting optimal strategies at decision making points are important. However, there is lack of research on optimal strategies for Curling. 'Aggressive' and 'Conservative' strategies are common strategies of Curling; nevertheless, even those two strategies have never been studied before. In this study, Markov Decision Process would be applied for Curling strategy analysis. Those two strategies are defined as actions of Markov Decision Process. By solving the model, the optimal strategy could be found at any in-game states.

강화학습을 이용한 이종 장비 토목 공정 계획 (Earthwork Planning via Reinforcement Learning with Heterogeneous Construction Equipment)

  • 지민기;박준건;김도형;정요한;박진규;문일철
    • 한국시뮬레이션학회논문지
    • /
    • 제27권1호
    • /
    • pp.1-13
    • /
    • 2018
  • 토목 공정 계획은 건설 공정 관리에서 중요한 과제 중 하나이다. 수학적 방법론에 기반을 둔 최적화 기법, 휴리스틱에 기반을 둔 최적화 기법 그리고 행위자 기반의 시뮬레이션 등의 방법론이 건설 공정 관리를 위해 적용되어왔다. 본 연구에서는 가상의 토목 공정 환경을 개발하고, 가상의 토목 공정 환경에서 강화학습을 이용한 시뮬레이션을 통해 토목 공정의 최적 경로를 찾는 방법을 제안하였다. 강화학습에 있어 본 연구에서는 상호작용 하며 서로 다른 행동을 하는 굴삭기와 트럭 에이전트들 에 대해 순차적 학습과 독립적 학습에 기반을 둔 두 가지의 Markov decision process (MDP)를 사용하였다. 가상의 토목 공정 환경에서 두 가지 방법 모두 최적에 가까운 토목 공정 계획을 만들어 낼 수 있음을 시뮬레이션 결과에 따라 알 수 있었으며, 이 계획은 건설 자동화의 기초가 될 수 있을 것이다.