• Title/Summary/Keyword: Markov decision-making

Search Result 31, Processing Time 0.025 seconds

A Simulation Sample Accumulation Method for Efficient Simulation-based Policy Improvement in Markov Decision Process (마르코프 결정 과정에서 시뮬레이션 기반 정책 개선의 효율성 향상을 위한 시뮬레이션 샘플 누적 방법 연구)

  • Huang, Xi-Lang;Choi, Seon Han
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.7
    • /
    • pp.830-839
    • /
    • 2020
  • As a popular mathematical framework for modeling decision making, Markov decision process (MDP) has been widely used to solve problem in many engineering fields. MDP consists of a set of discrete states, a finite set of actions, and rewards received after reaching a new state by taking action from the previous state. The objective of MDP is to find an optimal policy, that is, to find the best action to be taken in each state to maximize the expected discounted reward of policy (EDR). In practice, MDP is typically unknown, so simulation-based policy improvement (SBPI), which improves a given base policy sequentially by selecting the best action in each state depending on rewards observed via simulation, can be a practical way to find the optimal policy. However, the efficiency of SBPI is still a concern since many simulation samples are required to precisely estimate EDR for each action in each state. In this paper, we propose a method to select the best action accurately in each state using a small number of simulation samples, thereby improving the efficiency of SBPI. The proposed method accumulates the simulation samples observed in the previous states, so it is possible to precisely estimate EDR even with a small number of samples in the current state. The results of comparative experiments on the existing method demonstrate that the proposed method can improve the efficiency of SBPI.

Real Time Endpoint Detection in Plasma Etching Using Decision Making Algorithm (플라즈마 식각 공정에서 의사결정 알고리즘을 이용한 실시간 식각 종료점 검출)

  • Noh, Ho-Taek;Park, Young-Kook;Han, Seung-Soo
    • Journal of IKEEE
    • /
    • v.20 no.1
    • /
    • pp.9-15
    • /
    • 2016
  • The endpoint detection (EPD) is the most important technique in plasma etching process. In plasma etching process, the Optical Emission Spectroscopy (OES) is usually used to analyze plasma reaction. And Plasma Impedance Monitoring (PIM) system is used to measure the voltage, current, power, and load impedance of the supplied RF power during plasma process. In this paper, a new decision making algorithm is proposed to improve the performance of EPD in SiOx single layer plasma etching. To enhance the accuracy of the endpoint detection, both OES data and PIM data are utilized and a newly proposed decision making algorithm is applied. The proposed method successfully detected endpoint of silicon oxide plasma etching.

Energy-Saving Oriented On/Off Strategies in Heterogeneous Networks : an Asynchronous Approach with Dynamic Traffic Variations

  • Tang, Lun;Wang, Weili;Chen, Qianbin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.11
    • /
    • pp.5449-5464
    • /
    • 2018
  • Recent works have validated the possibility of reducing the energy consumption in wireless heterogeneous networks, achieved by switching on/off some base stations (BSs) dynamically. In this paper, to realize energy conservation, the discrete time Markov Decision Process (DTMDP) is developed to match up the BS switching operations with the traffic load variations. Then, an asynchronous decision-making algorithm, which is based on the Bellman equation and the on/off priorities of the BSs, is firstly put forward and proved to be optimal in this paper. Through reducing the state and action space during one decision, the proposed asynchronous algorithm can avoid the "curse of dimensionality" occurred in DTMDP frequently. Finally, numerical simulations are conducted to validate the effectiveness and advantages of the proposed asynchronous on/off strategies.

A Joint Allocation Algorithm of Computing and Communication Resources Based on Reinforcement Learning in MEC System

  • Liu, Qinghua;Li, Qingping
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.721-736
    • /
    • 2021
  • For the mobile edge computing (MEC) system supporting dense network, a joint allocation algorithm of computing and communication resources based on reinforcement learning is proposed. The energy consumption of task execution is defined as the maximum energy consumption of each user's task execution in the system. Considering the constraints of task unloading, power allocation, transmission rate and calculation resource allocation, the problem of joint task unloading and resource allocation is modeled as a problem of maximum task execution energy consumption minimization. As a mixed integer nonlinear programming problem, it is difficult to be directly solve by traditional optimization methods. This paper uses reinforcement learning algorithm to solve this problem. Then, the Markov decision-making process and the theoretical basis of reinforcement learning are introduced to provide a theoretical basis for the algorithm simulation experiment. Based on the algorithm of reinforcement learning and joint allocation of communication resources, the joint optimization of data task unloading and power control strategy is carried out for each terminal device, and the local computing model and task unloading model are built. The simulation results show that the total task computation cost of the proposed algorithm is 5%-10% less than that of the two comparison algorithms under the same task input. At the same time, the total task computation cost of the proposed algorithm is more than 5% less than that of the two new comparison algorithms.

Performance-based remaining life assessment of reinforced concrete bridge girders

  • Anoop, M.B.;Rao, K. Balaji;Raghuprasad, B.K.
    • Computers and Concrete
    • /
    • v.18 no.1
    • /
    • pp.69-97
    • /
    • 2016
  • Performance-based remaining life assessment of reinforced concrete bridge girders, subject to chloride-induced corrosion of reinforcement, is addressed in this paper. Towards this, a methodology that takes into consideration the human judgmental aspects in expert decision making regarding condition state assessment is proposed. The condition of the bridge girder is specified by the assignment of a condition state from a set of predefined condition states, considering both serviceability- and ultimate- limit states, and, the performance of the bridge girder is described using performability measure. A non-homogeneous Markov chain is used for modelling the stochastic evolution of condition state of the bridge girder with time. The thinking process of the expert in condition state assessment is modelled within a probabilistic framework using Brunswikian theory and probabilistic mental models. The remaining life is determined as the time over which the performance of the girder is above the required performance level. The usefulness of the methodology is illustrated through the remaining life assessment of a reinforced concrete T-beam bridge girder.

Optimization of Radiation Protection Using Markov Model (마코프 모델을 이용한 방사선 방어의 최적화)

  • Chung, Jin-Yop;Lee, Kun-Jai
    • Journal of Radiation Protection and Research
    • /
    • v.14 no.2
    • /
    • pp.1-9
    • /
    • 1989
  • An analytic method for quantitative comparisions between the alternatives for radiation protection optimization is required to aid the decision making process. This paper introduces the dynamic Markov model to evaluate the effect of inservice inspection, testing, and repair activities of the plant on radiation protection. In the example to put the Markov model into practice, the steam generator inspection intervals which minimize expected cost and total exposure dose were determined using the data for Kori-2 unit and foreign plants. The results show that the effect of the radiation exposure on the steam generator inspection interval is determined by the cost rather than the radiation exposure. The Markov model used in the example can be applied easily to the domestic NPPs by replenishing the data and also can be used in evaluating the comparative priority between various alternatives for radiation protection optimization.

  • PDF

A Study on M / M (a, b ; ${\mu}_k$) / 1 Batch Service Queueing Model (M/M(a, b ; ${\mu}_k$)/1 배치 서비스 대기모델에 대한 연구)

  • Lee, Hwa-Ki;Chung, Kyung-Il
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.21 no.3
    • /
    • pp.345-356
    • /
    • 1995
  • The aim of this paper is to analyze the batch service queueing model M/M(a, b ; ${\mu}_k/1$) under general bulk service rule with mean service rate ${\mu}_k$ for a batch of k units, where $a{\leq}k{\leq}b$. This queueing model consists of the two-dimensional state space so that it is characterized by two-dimensional state Markov process. The steady-state solution and performane measure of this process are derived by using Matrix Geometric method. Meanwhile, a new approach is suggested to calculate the two-dimensional traffic density R which is used to obtain the steady-state solution. In addition, to determine the optimal service initiation threshold a, a decision model of this queueing system is developed evaluating cost of service per batch and cost of waiting per customer. In a job order production system, the decision-making procedure presented in this paper can be applicable to determining when production should be started.

  • PDF

The Primary Process and Key Concepts of Economic Evaluation in Healthcare

  • Kim, Younhee;Kim, Yunjung;Lee, Hyeon-Jeong;Lee, Seulki;Park, Sun-Young;Oh, Sung-Hee;Jang, Suhyun;Lee, Taejin;Ahn, Jeonghoon;Shin, Sangjin
    • Journal of Preventive Medicine and Public Health
    • /
    • v.55 no.5
    • /
    • pp.415-423
    • /
    • 2022
  • Economic evaluations in the healthcare are used to assess economic efficiency of pharmaceuticals and medical interventions such as diagnoses and medical procedures. This study introduces the main concepts of economic evaluation across its key steps: planning, outcome and cost calculation, modeling, cost-effectiveness results, uncertainty analysis, and decision-making. When planning an economic evaluation, we determine the study population, intervention, comparators, perspectives, time horizon, discount rates, and type of economic evaluation. In healthcare economic evaluations, outcomes include changes in mortality, the survival rate, life years, and quality-adjusted life years, while costs include medical, non-medical, and productivity costs. Model-based economic evaluations, including decision tree and Markov models, are mainly used to calculate the total costs and total effects. In cost-effectiveness or costutility analyses, cost-effectiveness is evaluated using the incremental cost-effectiveness ratio, which is the additional cost per one additional unit of effectiveness gained by an intervention compared with a comparator. All outcomes have uncertainties owing to limited evidence, diverse methodologies, and unexplained variation. Thus, researchers should review these uncertainties and confirm their robustness. We hope to contribute to the establishment and dissemination of economic evaluation methodologies that reflect Korean clinical and research environment and ultimately improve the rationality of healthcare policies.

A Study on the Hydrologic Decision-Making for Drought Management : 1. An Analysis on the Stochastic Behavior of PDSI using markov chain (가뭄관리를 위한 수문학적 의사결정에 관한 연구 : 1. 마코프연쇄를 이용한 PDSI의 추계학적 거동분석)

  • Kang, In-Joo;Yoon, Yong-Nam
    • Journal of Korea Water Resources Association
    • /
    • v.35 no.5
    • /
    • pp.583-595
    • /
    • 2002
  • The purposes of this study are to perform the management and monitoring of droughts for Mokpo area via the monthly Palmer index(PDSI), the data is obtained from the Mokpo meteorological station, and the used data are in the period of 1906 to 1999. Monthly Palmer index is classified into 7 stochastic classes and its dynamic change of monthly transition probability estimated by Markov chain is investigated. We also estimate the steady state probability of the classified PDSI. The 4th class shows the highest frequency of 49.6% out of 7 classes and the 7th class which is the most extreme drought show that a stochastic transition probability is more or less larger than an empirical one. Also, we found that the monthly steady state probability could be used for the forecasting of changing pattern of drought magnitude for the study area.

AN INVESTIGATION OF THE KOREAN GENERAL INSURANCE INDUSTRY: EVIDENCE OF STRUCTURAL CHANGES AND IMPACT OF MACRO-ECONOMIC FACTORS ON LOSS RATIOS

  • Thompson, Ephraim Kwashie;Kim, So-Yeun
    • East Asian mathematical journal
    • /
    • v.38 no.5
    • /
    • pp.617-641
    • /
    • 2022
  • In this study, we first present a brief overview of the Korean general insurance market. We then explore the characteristics of the loss ratios of the Korean general insurance industry and apply Markov regime-switching methodology to model the loss ratios of these insurance companies by line of business based on changes in economic regimes. This study applies a number of confirmatory tests such as Zivot-Andrews test (2002), the Chow (1960) test and the Bai and Perron (1998) to confirm the presence of structural breaks in the time series of the loss ratios by line of business. Then, we employ Markov regime-switching methodology to model these loss ratios. We find empirical evidence that the loss ratios reported by insurance companies in Korea is characterized by two distinct regimes; a regime with high volatility and a regime with low volatility, except for vehicle insurance. Our analyses suggest that macro-economic conditions have significant explanatory effect on loss ratios but the direction of effect differs based on the line of business and the regime. Unlike previous studies that have applied linear regressions or divided the samples into different periods and then apply linear regressions to model loss ratios, we argue for the application of Markov regime-switching methodology, which are able to automatically distinguish the different regimes that may be associated with the movements of loss ratios based on differing economic conditions and regulatory upheavals. This study provides a more in depth understanding of loss ratios in the general insurance industry and will be of value to insurance practitioners in modelling the loss ratios associated with their businesses to aid in their decision making. The results may also provide a basis for further studies in other markets apart from Korea as well as for shaping policy decisions related to loss ratios.