• Title/Summary/Keyword: Approximate Dynamic Programming

Search Result 16, Processing Time 0.038 seconds

Approximate Dynamic Programming Strategies and Their Applicability for Process Control: A Review and Future Directions

  • Lee, Jong-Min;Lee, Jay H.
    • International Journal of Control, Automation, and Systems
    • /
    • v.2 no.3
    • /
    • pp.263-278
    • /
    • 2004
  • This paper reviews dynamic programming (DP), surveys approximate solution methods for it, and considers their applicability to process control problems. Reinforcement Learning (RL) and Neuro-Dynamic Programming (NDP), which can be viewed as approximate DP techniques, are already established techniques for solving difficult multi-stage decision problems in the fields of operations research, computer science, and robotics. Owing to the significant disparity of problem formulations and objective, however, the algorithms and techniques available from these fields are not directly applicable to process control problems, and reformulations based on accurate understanding of these techniques are needed. We categorize the currently available approximate solution techniques fur dynamic programming and identify those most suitable for process control problems. Several open issues are also identified and discussed.

Approximate Dynamic Programming-Based Dynamic Portfolio Optimization for Constrained Index Tracking

  • Park, Jooyoung;Yang, Dongsu;Park, Kyungwook
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.13 no.1
    • /
    • pp.19-30
    • /
    • 2013
  • Recently, the constrained index tracking problem, in which the task of trading a set of stocks is performed so as to closely follow an index value under some constraints, has often been considered as an important application domain for control theory. Because this problem can be conveniently viewed and formulated as an optimal decision-making problem in a highly uncertain and stochastic environment, approaches based on stochastic optimal control methods are particularly pertinent. Since stochastic optimal control problems cannot be solved exactly except in very simple cases, approximations are required in most practical problems to obtain good suboptimal policies. In this paper, we present a procedure for finding a suboptimal solution to the constrained index tracking problem based on approximate dynamic programming. Illustrative simulation results show that this procedure works well when applied to a set of real financial market data.

Application of Recent Approximate Dynamic Programming Methods for Navigation Problems (주행문제를 위한 최신 근사적 동적계획법의 적용)

  • Min, Dae-Hong;Jung, Keun-Woo;Kwon, Ki-Young;Park, Joo-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.6
    • /
    • pp.737-742
    • /
    • 2011
  • Navigation problems include the task of determining the control input under various constraints for systems such as mobile robots subject to uncertain disturbance. Such tasks can be modeled as constrained stochastic control problems. In order to solve these control problems, one may try to utilize the dynamic programming(DP) methods which rely on the concept of optimal value function. However, in most real-world problems, this trial would give us many difficulties; for examples, the exact system model may not be known; the computation of the optimal control policy may be impossible; and/or a huge amount of computing resource may be in need. As a strategy to overcome the difficulties of DP, one can utilize ADP(approximate dynamic programming) methods, which find suboptimal control policies resorting to approximate value functions. In this paper, we apply recently proposed ADP methods to a class of navigation problems having complex constraints, and observe the resultant performance characteristics.

SOLVING A SYSTEM OF THE NONLINEAR EQUATIONS BY ITERATIVE DYNAMIC PROGRAMMING

  • Effati, S.;Roohparvar, H.
    • Journal of applied mathematics & informatics
    • /
    • v.24 no.1_2
    • /
    • pp.399-409
    • /
    • 2007
  • In this paper we use iterative dynamic programming in the discrete case to solve a wide range of the nonlinear equations systems. First, by defining an error function, we transform the problem to an optimal control problem in discrete case. In using iterative dynamic programming to solve optimal control problems up to now, we have broken up the problem into a number of stages and assumed that the performance index could always be expressed explicitly in terms of the state variables at the last stage. This provided a scheme where we could proceed backwards in a systematic way, carrying out optimization at each stage. Suppose that the performance index can not be expressed in terms of the variables at the last stage only. In other words, suppose the performance index is also a function of controls and variables at the other stages. Then we have a nonseparable optimal control problem. Furthermore, we obtain the path from the initial point up to the approximate solution.

Control of pH Neutralization Process using Simulation Based Dynamic Programming in Simulation and Experiment (ICCAS 2004)

  • Kim, Dong-Kyu;Lee, Kwang-Soon;Yang, Dae-Ryook
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.620-626
    • /
    • 2004
  • For general nonlinear processes, it is difficult to control with a linear model-based control method and nonlinear controls are considered. Among the numerous approaches suggested, the most rigorous approach is to use dynamic optimization. Many general engineering problems like control, scheduling, planning etc. are expressed by functional optimization problem and most of them can be changed into dynamic programming (DP) problems. However the DP problems are used in just few cases because as the size of the problem grows, the dynamic programming approach is suffered from the burden of calculation which is called as 'curse of dimensionality'. In order to avoid this problem, the Neuro-Dynamic Programming (NDP) approach is proposed by Bertsekas and Tsitsiklis (1996). To get the solution of seriously nonlinear process control, the interest in NDP approach is enlarged and NDP algorithm is applied to diverse areas such as retailing, finance, inventory management, communication networks, etc. and it has been extended to chemical engineering parts. In the NDP approach, we select the optimal control input policy to minimize the value of cost which is calculated by the sum of current stage cost and future stages cost starting from the next state. The cost value is related with a weight square sum of error and input movement. During the calculation of optimal input policy, if the approximate cost function by using simulation data is utilized with Bellman iteration, the burden of calculation can be relieved and the curse of dimensionality problem of DP can be overcome. It is very important issue how to construct the cost-to-go function which has a good approximate performance. The neural network is one of the eager learning methods and it works as a global approximator to cost-to-go function. In this algorithm, the training of neural network is important and difficult part, and it gives significant effect on the performance of control. To avoid the difficulty in neural network training, the lazy learning method like k-nearest neighbor method can be exploited. The training is unnecessary for this method but requires more computation time and greater data storage. The pH neutralization process has long been taken as a representative benchmark problem of nonlin ar chemical process control due to its nonlinearity and time-varying nature. In this study, the NDP algorithm was applied to pH neutralization process. At first, the pH neutralization process control to use NDP algorithm was performed through simulations with various approximators. The global and local approximators are used for NDP calculation. After that, the verification of NDP in real system was made by pH neutralization experiment. The control results by NDP algorithm was compared with those by the PI controller which is traditionally used, in both simulations and experiments. From the comparison of results, the control by NDP algorithm showed faster and better control performance than PI controller. In addition to that, the control by NDP algorithm showed the good results when it applied to the cases with disturbances and multiple set point changes.

  • PDF

Approximate Dynamic Programming Based Interceptor Fire Control and Effectiveness Analysis for M-To-M Engagement (근사적 동적계획을 활용한 요격통제 및 동시교전 효과분석)

  • Lee, Changseok;Kim, Ju-Hyun;Choi, Bong Wan;Kim, Kyeongtaek
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.50 no.4
    • /
    • pp.287-295
    • /
    • 2022
  • As low altitude long-range artillery threat has been strengthened, the development of anti-artillery interception system to protect assets against its attacks will be kicked off. We view the defense of long-range artillery attacks as a typical dynamic weapon target assignment (DWTA) problem. DWTA is a sequential decision process in which decision making under future uncertain attacks affects the subsequent decision processes and its results. These are typical characteristics of Markov decision process (MDP) model. We formulate the problem as a MDP model to examine the assignment policy for the defender. The proximity of the capital of South Korea to North Korea border limits the computation time for its solution to a few second. Within the allowed time interval, it is impossible to compute the exact optimal solution. We apply approximate dynamic programming (ADP) approach to check if ADP approach solve the MDP model within processing time limit. We employ Shoot-Shoot-Look policy as a baseline strategy and compare it with ADP approach for three scenarios. Simulation results show that ADP approach provide better solution than the baseline strategy.

Investigations on data-driven stochastic optimal control and approximate-inference-based reinforcement learning methods (데이터 기반 확률론적 최적제어와 근사적 추론 기반 강화 학습 방법론에 관한 고찰)

  • Park, Jooyoung;Ji, Seunghyun;Sung, Keehoon;Heo, Seongman;Park, Kyungwook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.4
    • /
    • pp.319-326
    • /
    • 2015
  • Recently in the fields o f stochastic optimal control ( SOC) and reinforcemnet l earning (RL), there have been a great deal of research efforts for the problem of finding data-based sub-optimal control policies. The conventional theory for finding optimal controllers via the value-function-based dynamic programming was established for solving the stochastic optimal control problems with solid theoretical background. However, they can be successfully applied only to extremely simple cases. Hence, the data-based modern approach, which tries to find sub-optimal solutions utilizing relevant data such as the state-transition and reward signals instead of rigorous mathematical analyses, is particularly attractive to practical applications. In this paper, we consider a couple of methods combining the modern SOC strategies and approximate inference together with machine-learning-based data treatment methods. Also, we apply the resultant methods to a variety of application domains including financial engineering, and observe their performance.

Dynamic Manipulability for Cooperating Multiple Robot Systems with Frictional Contacts (접촉 마찰을 고려한 다중 로봇 시스템의 조작도 해석)

  • Byun Jae-Min;Lee Ji-Hong
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.43 no.5 s.311
    • /
    • pp.10-18
    • /
    • 2006
  • We propose a new approach to compute possible acceleration boundary, so is called dynamic manipulability, for multiple robotic systems with frictional contacts between robot end-effectors and object. As the frictional contact condition which requires each contact force to lie within a friction cone is based on the nonlinear inequality formalism is not easy to handle the constraint in manipulability analysis. To include the frictional contact condition into the conventional manipulability analysis we approximate the friction cone to a pyramid which is described by linear inequality constraints. And then achievable acceleration boundaries of manipulated object are calculated conventional linear programming technique under constraints for torque capability of each robot and the approximated contact condition. With the proposed method we find some solution to which conventional approaches did not reach. Also, case studies are Presented to illustrate the correctness of the proposed approach for two robot systems of simple planar robots and PUMA560 robots.

Approximate Dynamic Programming for Linear Quadratic Optimal Control with Degree of Stability (안정도 단계가 고려된 LQ 최적 제어에 대한 근사 다이나믹 프로그래밍)

  • Lee, Jae-Young;Park, Jin-Bae;Choi, Yoon-Ho
    • Proceedings of the KIEE Conference
    • /
    • 2009.07a
    • /
    • pp.1794_1795
    • /
    • 2009
  • 본 논문에서는 안정도 단계(degree of stability)가 고려된 LQ 최적 제어에 대한 근사 다이나믹 프로그래밍 기법을 제안한다. 제안된 근사 다이나믹 프로그래밍 기법은 시스템 행렬(system matrix)를 모르는 경우에도 구현할 수 있으며, 특정 조건하에서 수렴성을 가짐을 수학적으로 증명하였다. 또한 제안된 알고리즘을 토대로 하는 최소 자승법 기반 실시간 구현 방법에 대해 소개하였으며, 컴퓨터 모의 실험을 통해 제안된 근사 다이나믹 프로그래밍의 성능을 입증하였다.

  • PDF

A Polynomial Time Approximation Scheme for Enormous Euclidean Minimum Spanning Tree Problem (대형 유클리드 최소신장트리 문제해결을 위한 다항시간 근사 법)

  • Kim, In-Bum
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.5
    • /
    • pp.64-73
    • /
    • 2011
  • The problem of Euclidean minimum spanning tree (EMST) is to connect given nodes in a plane with minimum cost. There are many algorithms for the polynomial time problem as EMST. However, for numerous nodes, the algorithms consume an enormous amount of time to find an optimal solution. In this paper, an approximation scheme using a polynomial time approximation scheme (PTAS) algorithm with dividing and parallel processing for the problem is suggested. This scheme enables to construct a large, approximate EMST within a short duration. Although initially devised for the non-polynomial problem, we employ naive PTAS to construct a vast EMST with dynamic programming. In an experiment, the approximate EMST constructed by the proposed scheme with 15,000 input terminal nodes and 16 partition cells shows 89% and 99% saving in execution time for the serial processing and parallel processing methods, respectively. Therefore, our scheme can be applied to obtain an approximate EMST quickly for numerous input terminal nodes.