DOI QR코드

DOI QR Code

연속시간 선형시스템에 대한 탐색화된 정책반복법

Explorized Policy Iteration For Continuous-Time Linear Systems

  • 이재영 (연세대학교 전기전자공학과) ;
  • 전태윤 (연세대학교 전기전자공학과) ;
  • 최윤호 (경기대학교 전자공학부) ;
  • 박진배 (연세대학교 전기전자공학과)
  • 투고 : 2011.11.29
  • 심사 : 2012.02.20
  • 발행 : 2012.03.01

초록

This paper addresses the problem that policy iteration (PI) for continuous-time (CT) systems requires explorations of the state space which is known as persistency of excitation in adaptive control community, and as a result, proposes a PI scheme explorized by an additional probing signal to solve the addressed problem. The proposed PI method efficiently finds in online fashion the related CT linear quadratic (LQ) optimal control without knowing the system matrix A, and guarantees the stability and convergence to the LQ optimal control, which is proven in this paper in the presence of the probing signal. A design method for the probing signal is also presented to balance the exploration of the state space and the control performance. Finally, several simulation results are provided to verify the effectiveness of the proposed explorized PI method.

키워드

참고문헌

  1. R. A. Howard, Dynamic Programming and Markov Processes, Cambridge, MA: MIT Press, 1960.
  2. R. S. Sutton and A. G.. Barto, Reinforcement Learning: an introduction, MIT Press, Cambridge, Massachussetts, 1998.
  3. F. Y. Wang, H. Zhang, and D. Liu, "Adaptive dynamic programming: an introduction," IEEE Computational Intelligent Magazine, vol. 4, no. 2, pp. 39-47, 2009. https://doi.org/10.1109/MCI.2009.932261
  4. J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, "Adaptive dynamic programming," IEEE Trans. Systems, Mans and Cybernetics, vol. 32, no. 2, pp. 140-153, 2002. https://doi.org/10.1109/TSMCC.2002.801727
  5. F. L. Lewis and D. Vrabie, "Reinforcement learning and adaptive dynamic programming for feedback control," IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32-50, 2009. https://doi.org/10.1109/MCAS.2009.933854
  6. S. J. Bradke and B. E. Ydstie, "Adaptive linear quadratic control using policy iteration," Proc. American Control Conference, pp. 3475-3479, 1994.
  7. K. J. Zhang, Y. K. Xu, X. Chen, and X. R. Cao, "Policy iteration based feedback control," Automatica, vol. 44, no. 4, pp. 1055-1061, 2008. https://doi.org/10.1016/j.automatica.2007.08.014
  8. D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, "Adaptive optimal control for continuoustime linear systems based on policy iteration," Automatica, vol. 45, no. 2, pp. 477-484, 2009. https://doi.org/10.1016/j.automatica.2008.08.017
  9. D. Vrabie, O. Pastravanu, and F. L. Lewis, "Policy iteration for continuous-time systems with unkown internal dynamics," In Proc. Mediterranean Conf. Control and Automation, Athens, Greece, 2007.
  10. L. Kleinman, "On an iterative technique for Riccati equation computations," IEEE Trans. Automatic Control, vol. AC-13, no. 1, pp. 114-115, 1968.
  11. R. Beard, G.. Saridis, and J. Wen, "Approximate solutions to the time-invariant Hamilton-Jacobi-Bellman equation," Journal of Optimization Theory and Applications, vol. 96, no. 3, pp. 589-626, 1998. https://doi.org/10.1023/A:1022664528457
  12. H. K. Khalil, Nonlinear Systems, Prentice Hall, 2002.
  13. J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. M. Moor, "A note on persistency of excitation," Systems & Control Letters, vol. 54, no. 4, pp. 325-329, 2005. https://doi.org/10.1016/j.sysconle.2004.09.003
  14. G. Strang, Linear Algebra and Its Applications, California: Thomson Higher Edition, 2006.
  15. B. L. Stevens and F. L. Lewis, Aircraft Control and Simulations, Willey, 2nd Edition, 2003.
  16. J. Y. Lee, J. B. Park, and Y. H. Choi, 'Policyiteration- based adaptive optimal control for uncertain continuous-time linear systems with excitation signals, Int'l Conf. on Control, Automation, and Systems (ICCAS), Ilsan, South Korea, Oct. 2010.