• Title/Summary/Keyword: Markov decision process

Search Result 130, Processing Time 0.023 seconds

The Decision Making Strategy for Determining the Optimal Production Time : A Stochastic Process and NPV Approach (최적생산시기 결정을 위한 의사결정전략 : 추계적 과정과 순현재가치 접근)

  • Choi, Jong-Du
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.32 no.1
    • /
    • pp.147-160
    • /
    • 2007
  • In this paper, the optimal decision making strategy for resource management is viewed in terms of a combined strategy of planting and producing time. A model which can be used to determine the optimal management strategy is developed, and focuses on how to design the operation of a Markov chain so as to optimize its performance. This study estimated a dynamic stochastic model to compare alternative production style and used the net present value of returns to evaluate the scenarios. The managers in this study may be able to increase economic returns by delaying produce in order to market larder, more valuable commodities.

A Localized Adaptive QoS Routing Scheme Using POMDP and Exploration Bonus Techniques (POMDP와 Exploration Bonus를 이용한 지역적이고 적응적인 QoS 라우팅 기법)

  • Han Jeong-Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.3B
    • /
    • pp.175-182
    • /
    • 2006
  • In this paper, we propose a Localized Adaptive QoS Routing Scheme using POMDP and Exploration Bonus Techniques. Also, this paper shows that CEA technique using expectation values can be simply POMDP problem, because performing dynamic programming to solve a POMDP is highly computationally expensive. And we use Exploration Bonus to search detour path better than current path. For this, we proposed the algorithm(SEMA) to search multiple path. Expecially, we evaluate performances of service success rate and average hop count with $\phi$ and k performance parameters, which is defined as exploration count and intervals. As result, we knew that the larger $\phi$, the better detour path search. And increasing n increased the amount of exploration.

An Improved DSA Strategy based on Triple-States Reward Function (Triple-state 보상 함수를 기반으로 한 개선된 DSA 기법)

  • Ahmed, Tasmia;Gu, Jun-Rong;Jang, Sung-Jeen;Kim, Jae-Moung
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.47 no.11
    • /
    • pp.59-68
    • /
    • 2010
  • In this paper, we present a new method to complete Dynamic Spectrum Access by modifying the reward function. Partially Observable Markov Decision Process (POMDP) is an eligible algorithm to predict the upcoming spectrum opportunity. In POMDP, Reward function is the last portion and very important for prediction. However, the Reward function has only two states (Busy and Idle). When collision happens in the channel, reward function indicates busy state which is responsible for the throughput decreasing of secondary user. In this paper, we focus the difference between busy and collision state. We have proposed a new algorithm for reward function that indicates an additional state of collision which brings better communication opportunity for secondary users. Secondary users properly utilize opportunities to access Primary User channels for efficient data transmission with the help of the new reward function. We have derived mathematical belief vector of the new algorithm as well. Simulation results have corroborated the superior performance of improved reward function. The new algorithm has increased the throughput for secondary user in cognitive radio network.

Robust Scheduling based on Daily Activity Learning by using Markov Decision Process and Inverse Reinforcement Learning (강건한 스케줄링을 위한 마코프 의사결정 프로세스 추론 및 역강화 학습 기반 일상 행동 학습)

  • Lee, Sang-Woo;Kwak, Dong-Hyun;On, Kyoung-Woon;Heo, Yujung;Kang, Wooyoung;Cinarel, Ceyda;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.10
    • /
    • pp.599-604
    • /
    • 2017
  • A useful application of smart assistants is to predict and suggest users' daily behaviors the way real assistants do. Conventional methods to predict behavior have mainly used explicit schedule information logged by a user or extracted from e-mail or SNS data. However, gathering explicit information for smart assistants has limitations, and much of a user's routine behavior is not logged in the first place. In this paper, we suggest a novel approach that combines explicit schedule information with patterns of routine behavior. We propose using inference based on a Markov decision process and learning with a reward function based on inverse reinforcement learning. The results of our experiment shows that the proposed method outperforms comparable models on a life-log dataset collected over six weeks.

Opportunistic Spectrum Access Based on a Constrained Multi-Armed Bandit Formulation

  • Ai, Jing;Abouzeid, Alhussein A.
    • Journal of Communications and Networks
    • /
    • v.11 no.2
    • /
    • pp.134-147
    • /
    • 2009
  • Tracking and exploiting instantaneous spectrum opportunities are fundamental challenges in opportunistic spectrum access (OSA) in presence of the bursty traffic of primary users and the limited spectrum sensing capability of secondary users. In order to take advantage of the history of spectrum sensing and access decisions, a sequential decision framework is widely used to design optimal policies. However, many existing schemes, based on a partially observed Markov decision process (POMDP) framework, reveal that optimal policies are non-stationary in nature which renders them difficult to calculate and implement. Therefore, this work pursues stationary OSA policies, which are thereby efficient yet low-complexity, while still incorporating many practical factors, such as spectrum sensing errors and a priori unknown statistical spectrum knowledge. First, with an approximation on channel evolution, OSA is formulated in a multi-armed bandit (MAB) framework. As a result, the optimal policy is specified by the wellknown Gittins index rule, where the channel with the largest Gittins index is always selected. Then, closed-form formulas are derived for the Gittins indices with tunable approximation, and the design of a reinforcement learning algorithm is presented for calculating the Gittins indices, depending on whether the Markovian channel parameters are available a priori or not. Finally, the superiority of the scheme is presented via extensive experiments compared to other existing schemes in terms of the quality of policies and optimality.

Throughput Maximization for a Primary User with Cognitive Radio and Energy Harvesting Functions

  • Nguyen, Thanh-Tung;Koo, Insoo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.9
    • /
    • pp.3075-3093
    • /
    • 2014
  • In this paper, we consider an advanced wireless user, called primary-secondary user (PSU) who is capable of harvesting renewable energy and connecting to both the primary network and cognitive radio networks simultaneously. Recently, energy harvesting has received a great deal of attention from the research community and is a promising approach for maintaining long lifetime of users. On the other hand, the cognitive radio function allows the wireless user to access other primary networks in an opportunistic manner as secondary users in order to receive more throughput in the current time slot. Subsequently, in the paper we propose the channel access policy for a PSU with consideration of the energy harvesting, based on a Partially Observable Markov decision process (POMDP) in which the optimal action from the action set will be selected to maximize expected long-term throughput. The simulation results show that the proposed POMDP-based channel access scheme improves the throughput of PSU, but it requires more computations to make an action decision regarding channel access.

A Study of Adaptive QoS Routing scheme using Policy-gradient Reinforcement Learning (정책 기울기 값 강화학습을 이용한 적응적인 QoS 라우팅 기법 연구)

  • Han, Jeong-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.93-99
    • /
    • 2011
  • In this paper, we propose a policy-gradient routing scheme under Reinforcement Learning that can be used adaptive QoS routing. A policy-gradient RL routing can provide fast learning of network environments as using optimal policy adapted average estimate rewards gradient values. This technique shows that fast of learning network environments results in high success rate of routing. For prove it, we simulate and compare with three different schemes.

On Exponential Utility Maximization

  • Chung, Kun-Jen
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.13 no.2
    • /
    • pp.66-71
    • /
    • 1988
  • Let B be present value of some sequence. This paper concerns the maximization of the expected utility of the present value B when the utility function is exponential.

  • PDF

Efficient context dependent process modeling using state tying and decision tree-based method (상태 공유와 결정트리 방법을 이용한 효율적인 문맥 종속 프로세스 모델링)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.3
    • /
    • pp.369-377
    • /
    • 2010
  • In vocabulary recognition systems based on HMM(Hidden Markov Model)s, training process unseen model bring on show a low recognition rate. If recognition vocabulary modify and make an addition then recreated modeling of executed database collected and training sequence on account of bring on additional expenses and take more time. This study suggest efficient context dependent process modeling method using decision tree-based state tying. On study suggest method is reduce recreated of model and it's offered that robustness and accuracy of context dependent acoustic modeling. Also reduce amount of model and offered training process unseen model as concerns context dependent a likely phoneme model has been used unseen model solve the matter. System performance as a result of represent vocabulary dependence recognition rate of 98.01%, vocabulary independence recognition rate of 97.38%.