[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5370/KIEE.2012.61.3.451

Explorized Policy Iteration For Continuous-Time Linear Systems

Lee, Jae-Young (연세대학교 전기전자공학과)
Chun, Tae-Yoon (연세대학교 전기전자공학과)
Choi, Yoon-Ho (경기대학교 전자공학부)
Park, Jin-Bae (연세대학교 전기전자공학과)

Publication Information

The Transactions of The Korean Institute of Electrical Engineers / v.61, no.3, 2012 , pp. 451-458 More about this Journal

Abstract

This paper addresses the problem that policy iteration (PI) for continuous-time (CT) systems requires explorations of the state space which is known as persistency of excitation in adaptive control community, and as a result, proposes a PI scheme explorized by an additional probing signal to solve the addressed problem. The proposed PI method efficiently finds in online fashion the related CT linear quadratic (LQ) optimal control without knowing the system matrix A, and guarantees the stability and convergence to the LQ optimal control, which is proven in this paper in the presence of the probing signal. A design method for the probing signal is also presented to balance the exploration of the state space and the control performance. Finally, several simulation results are provided to verify the effectiveness of the proposed explorized PI method.

Keywords

Policy iteration; LQR; Adaptive optimal control; Exploration; Persistency of excitation;

Citations & Related Records

Times Cited By SCOPUS : 0

Reference

1	D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, "Adaptive optimal control for continuoustime linear systems based on policy iteration," Automatica, vol. 45, no. 2, pp. 477-484, 2009. DOI ScienceOn
2	D. Vrabie, O. Pastravanu, and F. L. Lewis, "Policy iteration for continuous-time systems with unkown internal dynamics," In Proc. Mediterranean Conf. Control and Automation, Athens, Greece, 2007.
3	L. Kleinman, "On an iterative technique for Riccati equation computations," IEEE Trans. Automatic Control, vol. AC-13, no. 1, pp. 114-115, 1968.
4	R. Beard, G.. Saridis, and J. Wen, "Approximate solutions to the time-invariant Hamilton-Jacobi-Bellman equation," Journal of Optimization Theory and Applications, vol. 96, no. 3, pp. 589-626, 1998. DOI ScienceOn
5	H. K. Khalil, Nonlinear Systems, Prentice Hall, 2002.
6	J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. M. Moor, "A note on persistency of excitation," Systems & Control Letters, vol. 54, no. 4, pp. 325-329, 2005. DOI ScienceOn
7	G. Strang, Linear Algebra and Its Applications, California: Thomson Higher Edition, 2006.
8	B. L. Stevens and F. L. Lewis, Aircraft Control and Simulations, Willey, 2nd Edition, 2003.
9	J. Y. Lee, J. B. Park, and Y. H. Choi, 'Policyiteration- based adaptive optimal control for uncertain continuous-time linear systems with excitation signals, Int'l Conf. on Control, Automation, and Systems (ICCAS), Ilsan, South Korea, Oct. 2010.
10	R. A. Howard, Dynamic Programming and Markov Processes, Cambridge, MA: MIT Press, 1960.
11	F. L. Lewis and D. Vrabie, "Reinforcement learning and adaptive dynamic programming for feedback control," IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32-50, 2009. DOI ScienceOn
12	R. S. Sutton and A. G.. Barto, Reinforcement Learning: an introduction, MIT Press, Cambridge, Massachussetts, 1998.
13	F. Y. Wang, H. Zhang, and D. Liu, "Adaptive dynamic programming: an introduction," IEEE Computational Intelligent Magazine, vol. 4, no. 2, pp. 39-47, 2009. DOI ScienceOn
14	J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, "Adaptive dynamic programming," IEEE Trans. Systems, Mans and Cybernetics, vol. 32, no. 2, pp. 140-153, 2002. DOI ScienceOn
15	S. J. Bradke and B. E. Ydstie, "Adaptive linear quadratic control using policy iteration," Proc. American Control Conference, pp. 3475-3479, 1994.
16	K. J. Zhang, Y. K. Xu, X. Chen, and X. R. Cao, "Policy iteration based feedback control," Automatica, vol. 44, no. 4, pp. 1055-1061, 2008. DOI ScienceOn

KSCI

Explorized Policy Iteration For Continuous-Time Linear Systems 연속시간 선형시스템에 대한 탐색화된 정책반복법

Explorized Policy Iteration For Continuous-Time Linear Systems