Browse > Article
http://dx.doi.org/10.5370/KIEE.2012.61.3.451

Explorized Policy Iteration For Continuous-Time Linear Systems  

Lee, Jae-Young (연세대학교 전기전자공학과)
Chun, Tae-Yoon (연세대학교 전기전자공학과)
Choi, Yoon-Ho (경기대학교 전자공학부)
Park, Jin-Bae (연세대학교 전기전자공학과)
Publication Information
The Transactions of The Korean Institute of Electrical Engineers / v.61, no.3, 2012 , pp. 451-458 More about this Journal
Abstract
This paper addresses the problem that policy iteration (PI) for continuous-time (CT) systems requires explorations of the state space which is known as persistency of excitation in adaptive control community, and as a result, proposes a PI scheme explorized by an additional probing signal to solve the addressed problem. The proposed PI method efficiently finds in online fashion the related CT linear quadratic (LQ) optimal control without knowing the system matrix A, and guarantees the stability and convergence to the LQ optimal control, which is proven in this paper in the presence of the probing signal. A design method for the probing signal is also presented to balance the exploration of the state space and the control performance. Finally, several simulation results are provided to verify the effectiveness of the proposed explorized PI method.
Keywords
Policy iteration; LQR; Adaptive optimal control; Exploration; Persistency of excitation;
Citations & Related Records

Times Cited By SCOPUS : 0
연도 인용수 순위
  • Reference
1 D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, "Adaptive optimal control for continuoustime linear systems based on policy iteration," Automatica, vol. 45, no. 2, pp. 477-484, 2009.   DOI   ScienceOn
2 D. Vrabie, O. Pastravanu, and F. L. Lewis, "Policy iteration for continuous-time systems with unkown internal dynamics," In Proc. Mediterranean Conf. Control and Automation, Athens, Greece, 2007.
3 L. Kleinman, "On an iterative technique for Riccati equation computations," IEEE Trans. Automatic Control, vol. AC-13, no. 1, pp. 114-115, 1968.
4 R. Beard, G.. Saridis, and J. Wen, "Approximate solutions to the time-invariant Hamilton-Jacobi-Bellman equation," Journal of Optimization Theory and Applications, vol. 96, no. 3, pp. 589-626, 1998.   DOI   ScienceOn
5 H. K. Khalil, Nonlinear Systems, Prentice Hall, 2002.
6 J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. M. Moor, "A note on persistency of excitation," Systems & Control Letters, vol. 54, no. 4, pp. 325-329, 2005.   DOI   ScienceOn
7 G. Strang, Linear Algebra and Its Applications, California: Thomson Higher Edition, 2006.
8 B. L. Stevens and F. L. Lewis, Aircraft Control and Simulations, Willey, 2nd Edition, 2003.
9 J. Y. Lee, J. B. Park, and Y. H. Choi, 'Policyiteration- based adaptive optimal control for uncertain continuous-time linear systems with excitation signals, Int'l Conf. on Control, Automation, and Systems (ICCAS), Ilsan, South Korea, Oct. 2010.
10 R. A. Howard, Dynamic Programming and Markov Processes, Cambridge, MA: MIT Press, 1960.
11 F. L. Lewis and D. Vrabie, "Reinforcement learning and adaptive dynamic programming for feedback control," IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32-50, 2009.   DOI   ScienceOn
12 R. S. Sutton and A. G.. Barto, Reinforcement Learning: an introduction, MIT Press, Cambridge, Massachussetts, 1998.
13 F. Y. Wang, H. Zhang, and D. Liu, "Adaptive dynamic programming: an introduction," IEEE Computational Intelligent Magazine, vol. 4, no. 2, pp. 39-47, 2009.   DOI   ScienceOn
14 J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, "Adaptive dynamic programming," IEEE Trans. Systems, Mans and Cybernetics, vol. 32, no. 2, pp. 140-153, 2002.   DOI   ScienceOn
15 S. J. Bradke and B. E. Ydstie, "Adaptive linear quadratic control using policy iteration," Proc. American Control Conference, pp. 3475-3479, 1994.
16 K. J. Zhang, Y. K. Xu, X. Chen, and X. R. Cao, "Policy iteration based feedback control," Automatica, vol. 44, no. 4, pp. 1055-1061, 2008.   DOI   ScienceOn