DOI QR코드

DOI QR Code

Gain Tuning for SMCSPO of Robot Arm with Q-Learning

Q-Learning을 사용한 로봇팔의 SMCSPO 게인 튜닝

  • Lee, JinHyeok (School of Mechanical Engineering, Pusan National University) ;
  • Kim, JaeHyung (School of Mechanical Engineering, Pusan National University) ;
  • Lee, MinCheol (School of Mechanical Engineering, Pusan National University)
  • Received : 2021.11.29
  • Accepted : 2022.03.09
  • Published : 2022.05.31

Abstract

Sliding mode control (SMC) is a robust control method to control a robot arm with nonlinear properties. A high switching gain of SMC causes chattering problems, although the SMC allows the adequate control performance by giving high switching gain, without the exact robot model containing nonlinear and uncertainty terms. In order to solve this problem, SMC with sliding perturbation observer (SMCSPO) has been researched, where the method can reduce the chattering by compensating the perturbation, which is estimated by the observer, and then choosing a lower switching control gain of SMC. However, optimal gain tuning is necessary to get a better tracking performance and reducing a chattering. This paper proposes a method that the Q-learning automatically tunes the control gains of SMCSPO with an iterative operation. In this tuning method, the rewards of reinforcement learning (RL) are set minus tracking errors of states, and the action of RL is a change of control gain to maximize rewards whenever the iteration number of movements increases. The simple motion test for a 7-DOF robot arm was simulated in MATLAB program to prove this RL tuning algorithm. The simulation showed that this method can automatically tune the control gains for SMCSPO.

Keywords

Acknowledgement

This paper was supported by Korea Institute for Advancement of Technology (KIAT) grant funded by the Korea Government (MOTIE) (P0008473, HRD Program for Industrial Innovation) and funded under 『the Competency Development Program for Industry Specialists』 of the Korean Ministry of Trade, Industry and Energy (MOTIE), operated by Korea Institute for Advancement of Technology (KIAT). (No. P0008473, The development of high skilled and innovative manpower to lead the Innovation based on Robot)

References

  1. S. H. Han, H. C. Cho, and K. S. Lee, "Position Control of Nonlinear Crane Systems using Dynamic Neural Network," Trans. Korean. Inst. Elect. Eng., vol. 56, no. 5, pp. 966-972, 2007, [Online], https://www.dbpia.co.kr/journal/articleDetail? nodeId=NODE01280780.
  2. K. D. Young, V. I. Utkin, and U. Ozguner, "A control engineer's guide to sliding mode control," IEEE Transactions on Control Systems Technology, vol. 7, no. 3, pp. 328-342, 1999, DOI: 10.1109/87.761053.
  3. V. Utkin and H. Lee, "Chattering Problem in Sliding Mode Control Systems," International Workshop on Variable Structure Systems, 2006. VSS'06, Alghero, Italy, pp. 346-350, 2006, DOI: 10.1109/VSS.2006.1644542.
  4. J. T. Moura, H. Elmali, and N. Olgac, "Sliding Mode Control with Sliding Perturbation Observer," ASME. J. Dyn. Sys., Meas., Control, vol. 119, no. 4, pp. 657-665, 1997. https://doi.org/10.1115/1.2802375
  5. M. G. Jung and M. C. Lee, "Study on Robust Control of Industrial Manipulator for Assembly Based on SMCSPO," Journal of Institute of Control, Robotics and Systems, vol. 24, no. 6, pp. 552-560, 2018, DOI: 10.5302/J.ICROS.2018. 18.0034.
  6. K.-G. Cha, S. M. Yoon, and M. C. Lee, "SPO based Reaction Force Estimation and Force Reflection Bilateral Control of Cylinder for Tele-Dismantling," Journal of Korea Robotics Society, vol. 12, no. 1, pp. 1-10, March, 2017, DOI: 10.7746/jkros.2017.12.1.001.
  7. T.-C. Kuo, Y.-J. Huang, and S.-H. Chang, "Sliding mode control with self-tuning law for uncertain nonlinear systems," ISA Transactions, vol. 47, no. 2, pp. 171-178, April, 2008, DOI: 10.1016/j.isatra.2007.10.001.
  8. K. S. You, M. C. Lee, and W. S. Yoo, "Sliding mode controller with sliding perturbation observer based on gain optimization using genetic algorithm," KSME International Journal, vol. 18, no. 4, pp. 630-639, 2004, DOI: 10.1007/BF02983647.
  9. C. Szepesvari, "Algorithms for reinforcement learning," Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, no. 1, pp. 1-103, 2010, DOI: 10.2200/S00268ED 1V01Y201005AIM009.
  10. Y. Yu, Z. Cao, S. Liang, Z. Liu, J. Yu, and X. Chen, "A Grasping CNN with Image Segmentation for Mobile Manipulating Robot," 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), Yunnan, China, pp. 1688-1692, 2019, DOI: 10.1109/ROBIO49542.2019. 8961 427.
  11. H. H. Kim, H. Khan, Y. J. An, and M. C. Lee, "Development of Reinforcement Learning Assembly Algorithm Based on Estimated Reaction Force Using Sliding Perturbation Observer," International Conference on Control, Automation and System, Busan, Korea, pp. 1018-1021, 2020, [Online], https://www.dbpia.co.kr/Journal/articleDetail?nodeId=NODE10493699.
  12. Y. Ansari, E. Falotico, Y. Mollard, B. Busch, M. Cianchetti, and C. Laschi, "A Multiagent Reinforcement Learning approach for inverse kinematics of high dimensional manipulators with precision positioning," 2016 6th IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob), University Town, Singapore, pp. 457-463, 2016, DOI: 10.1109/BIOROB.2016.7523669.
  13. F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, "Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers," IEEE Control Systems Magazine, vol. 32, no. 6, pp. 76-105, Dec. 2012, DOI: 10.1109/MCS.2012.2214134.
  14. W. J. Shipman and L. C. Coetzee, "Reinforcement Learning and Deep Neural Networks for PI Controller Tuning," IFAC-Papers OnLine, vol. 52, no. 14, pp. 111-116, 2019, DOI: 10.1016/j.ifacol.2019.09.173.
  15. C. J. Watkins and P. Dayan. "Q-learning," Machine Learning, vol. 8, pp. 279-292, 1992, [Online], http://www.gatsby.ucl.ac.uk/~dayan/papers/wd92.html. https://doi.org/10.1007/BF00992698
  16. J. M. Hollerbach, "A Recursive Lagrangian Formulation of Maniputator Dynamics and a Comparative Study of Dynamics Formulation Complexity," IEEE Transactions on Systems, Man, and Cybernetics, vol. 10, no. 11, pp. 730-736, Nov., 1980, DOI: 10.1109/TSMC.1980.4308393.
  17. J.-J. Slotine, J. K. Hedrick, and E. A. Misawa, "On Sliding Observers for Non-Linear Systems," ASME Journal of Dynamic Systems, Measurement, and Control, vol. 109, no. 3, pp. 245-252, 1987, DOI: 10.23919/ACC.1986.4789217.
  18. E. Rodrigues Gomes and R. Kowalczyk. "Dynamic analysis of multiagent Q-learning with ε-greedy exploration," Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09). Association for Computing Machinery, New York, USA, pp. 369-376, 2009, DOI: 10.1145/1553374.1553422.
  19. D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, "Deterministic Policy Gradient Algorithms," 31st International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 32, no, 1, pp. 387-395, 2014, [Online], http://proceedings.mlr.press/v32/silver14.html.
  20. T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, D. Horgan, J. Quan, A. Sendonaris, G. Dulac-Arnold, I. Osband, J. Agapiou, J. Z. Leibo, and A. Gruslys, "Deep q-learning from demonstrations," arXiv:1704.03732 [cs.AI], 2018, DOI: 10.48550/arXiv.1704.03732.
  21. W. Jouini, D. Ernst, C. Moy and J. Palicot, "Upper Confidence Bound Based Decision Making Strategies and Dynamic Spectrum Access," 2010 IEEE International Conference on Communications, Cape Town, South Africa, pp. 1-5, 2010, DOI: 10.1109/ICC.2010.5502014.