• Title/Summary/Keyword: Policy Iteration

Search Result 18, Processing Time 0.028 seconds

Markovian Approach of Inspection Policy in a Serial Manufacturing System (Markovian 접근방법을 이용한 직렬생산시스템의 검사정책)

  • 정영배;황의철
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.11 no.17
    • /
    • pp.81-85
    • /
    • 1988
  • This paper presents a model that considers combinations of rework, repair, replacement and scrapping. Policy-Iteration method of inspection is proposed for a serial manufacturing system whose repair cost, scrap cost and inspection cost. when it fails, can be formulated by Markovian approach. Policy-Iteration stops when new inspection policy is the same as previous inspection policy. A numerial example is presented.

  • PDF

Design of an Adaptive Robust Controller Based on Explorized Policy Iteration for the Stabilization of Multimachine Power Systems (다기 전력 시스템의 안정화를 위한 탐색화된 정책 반복법 기반 적응형 강인 제어기 설계)

  • Chun, Tae Yoon;Park, Jin Bae
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.20 no.11
    • /
    • pp.1118-1124
    • /
    • 2014
  • This paper proposes a novel controller design scheme for multimachine power systems based on the explorized policy iteration. Power systems have several uncertainties on system dynamics due to the various effects of interconnections between generators. To solve this problem, the proposed method solves the LQR (Linear Quadratic Regulation) problem of isolated subsystems without the knowledge of a system matrix and the interconnection parameters of multimachine power systems. By selecting the proper performance indices, it guarantees the stability and convergence of the LQ optimal control. To implement the proposed scheme, the least squares based online method is also investigated in terms of PE (Persistency of Excitation), interconnection parameters and exploration signals. Finally, the performance and effectiveness of the proposed algorithm are demonstrated by numerical simulations of three-machine power systems with governor controllers.

Dynamic Task Scheduling Via Policy Iteration Scheduling Approach for Cloud Computing

  • Hu, Bin;Xie, Ning;Zhao, Tingting;Zhang, Xiaotong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.3
    • /
    • pp.1265-1278
    • /
    • 2017
  • Dynamic task scheduling is one of the most popular research topics in the cloud computing field. The cloud scheduler dynamically provides VM resources to variable cloud tasks with different scheduling strategies in cloud computing. In this study, we utilized a valid model to describe the dynamic changes of both computing facilities (such as hardware updating) and request task queuing. We built a novel approach called Policy Iteration Scheduling (PIS) to globally optimize the independent task scheduling scheme and minimize the total execution time of priority tasks. We performed experiments with randomly generated cloud task sets and varied the performance of VM resources using Poisson distributions. The results show that PIS outperforms other popular schedulers in a typical cloud computing environment.

Explorized Policy Iteration For Continuous-Time Linear Systems (연속시간 선형시스템에 대한 탐색화된 정책반복법)

  • Lee, Jae-Young;Chun, Tae-Yoon;Choi, Yoon-Ho;Park, Jin-Bae
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.3
    • /
    • pp.451-458
    • /
    • 2012
  • This paper addresses the problem that policy iteration (PI) for continuous-time (CT) systems requires explorations of the state space which is known as persistency of excitation in adaptive control community, and as a result, proposes a PI scheme explorized by an additional probing signal to solve the addressed problem. The proposed PI method efficiently finds in online fashion the related CT linear quadratic (LQ) optimal control without knowing the system matrix A, and guarantees the stability and convergence to the LQ optimal control, which is proven in this paper in the presence of the probing signal. A design method for the probing signal is also presented to balance the exploration of the state space and the control performance. Finally, several simulation results are provided to verify the effectiveness of the proposed explorized PI method.

Policy Iteration Algorithm Based Fault Tolerant Tracking Control: An Implementation on Reconfigurable Manipulators

  • Li, Yuanchun;Xia, Hongbing;Zhao, Bo
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.4
    • /
    • pp.1740-1751
    • /
    • 2018
  • This paper proposes a novel fault tolerant tracking control (FTTC) scheme for a class of nonlinear systems with actuator failures based on the policy iteration (PI) algorithm and the adaptive fault observer. The estimated actuator failure from an adaptive fault observer is utilized to construct an improved performance index function that reflects the failure, regulation and control simultaneously. With the help of the proper performance index function, the FTTC problem can be transformed into an optimal control problem. The fault tolerant tracking controller is composed of the desired controller and the approximated optimal feedback one. The desired controller is developed to maintain the desired tracking performance at the steady-state, and the approximated optimal feedback controller is designed to stabilize the tracking error dynamics in an optimal manner. By establishing a critic neural network, the PI algorithm is utilized to solve the Hamilton-Jacobi-Bellman equation, and then the approximated optimal feedback controller can be derived. Based on Lyapunov technique, the uniform ultimate boundedness of the closed-loop system is proven. The proposed FTTC scheme is applied to reconfigurable manipulators with two degree of freedoms in order to test the effectiveness via numerical simulation.

Seamless Mobility of Heterogeneous Networks Based on Markov Decision Process

  • Preethi, G.A.;Chandrasekar, C.
    • Journal of Information Processing Systems
    • /
    • v.11 no.4
    • /
    • pp.616-629
    • /
    • 2015
  • A mobile terminal will expect a number of handoffs within its call duration. In the event of a mobile call, when a mobile node moves from one cell to another, it should connect to another access point within its range. In case there is a lack of support of its own network, it must changeover to another base station. In the event of moving on to another network, quality of service parameters need to be considered. In our study we have used the Markov decision process approach for a seamless handoff as it gives the optimum results for selecting a network when compared to other multiple attribute decision making processes. We have used the network cost function for selecting the network for handoff and the connection reward function, which is based on the values of the quality of service parameters. We have also examined the constant bit rate and transmission control protocol packet delivery ratio. We used the policy iteration algorithm for determining the optimal policy. Our enhanced handoff algorithm outperforms other previous multiple attribute decision making methods.

Operating Room Reservation Problem Considering Patient Priority : Modified Value Iteration Method with Binary Search (환자 우선순위를 고려한 수술실 예약 : 이진검색을 활용한 수정 평가치반복법)

  • Min, Dai-Ki
    • IE interfaces
    • /
    • v.24 no.4
    • /
    • pp.274-280
    • /
    • 2011
  • Delayed access to surgery may lead to deterioration in the patient condition, poor clinical outcomes, increase in the probability of emergency admission, or even death. The purpose of this work is to decide the number of patients selected from a waiting list and to schedule them in accordance with the operating room capacity in the next period. We formulate the problem as an infinite horizon Markov Decision Process (MDP), which attempts to strike a balance between the patient waiting times and overtime works. Structural properties of the proposed model are investigated to facilitate the solution procedure. The proposed procedure modifies the conventional value iteration method along with the binary search technique. An example of the optimal policy is provided, and computational results are given to show that the proposed procedure improves computational efficiency.

Machine Maintenance Policy Using Partially Observable Markov Decision Process

  • Pak, Pyoung Ki;Kim, Dong Won;Jeong, Byung Ho
    • Journal of Korean Society for Quality Management
    • /
    • v.16 no.2
    • /
    • pp.1-9
    • /
    • 1988
  • This paper considers a machine maintenance problem. The machine's condition is partially known by observing the machine's output products. This problem is formulated as an infinite horizon partially observable Markov decison process to find an optimal maintenance policy. However, even though the optimal policy of the model exists, finding the optimal policy is very time consuming. Thus, the intends of this study is to find ${\varepsilon}-optimal$ stationary policy minimizing the expected discounted total cost of the system, ${\varepsilon}-optimal$ policy is found by using a modified version of the well-known policy iteration algorithm. A numerical example is also shown.

  • PDF

Partially Observable Markov Decision Process with Lagged Information over Infinite Horizon

  • Jeong, Byong-Ho;Kim, Soung-Hie
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.16 no.1
    • /
    • pp.135-146
    • /
    • 1991
  • This paper shows the infinite horizon model of Partially Observable Markov Decision Process with lagged information. The lagged information is uncertain delayed observation of the process under control. Even though the optimal policy of the model exists, finding the optimal policy is very time consuming. Thus, the aim of this study is to find an .eplison.-optimal stationary policy minimizing the expected discounted total cost of the model. .EPSILON.- optimal policy is found by using a modified version of the well known policy iteration algorithm. The modification focuses to the value determination routine of the algorithm. Some properties of the approximation functions for the expected discounted cost of a stationary policy are presented. The expected discounted cost of a stationary policy is approximated based on these properties. A numerical example is also shown.

  • PDF

Digital signal change through artificial intelligence machine learning method comparison and learning (인공지능 기계학습 방법 비교와 학습을 통한 디지털 신호변화)

  • Yi, Dokkyun;Park, Jieun
    • Journal of Digital Convergence
    • /
    • v.17 no.10
    • /
    • pp.251-258
    • /
    • 2019
  • In the future, various products are created in various fields using artificial intelligence. In this age, it is a very important problem to know the operation principle of artificial intelligence learning method and to use it correctly. This paper introduces artificial intelligence learning methods that have been known so far. Learning of artificial intelligence is based on the fixed point iteration method of mathematics. The GD(Gradient Descent) method, which adjusts the convergence speed based on the fixed point iteration method, the Momentum method to summate the amount of gradient, and finally, the Adam method that mixed these methods. This paper describes the advantages and disadvantages of each method. In particularly, the Adam method having adaptivity controls learning ability of machine learning. And we analyze how these methods affect digital signals. The changes in the learning process of digital signals are the basis of accurate application and accurate judgment in the future work and research using artificial intelligence.