DOI QR코드

DOI QR Code

Cooperative Robot for Table Balancing Using Q-learning

테이블 균형맞춤 작업이 가능한 Q-학습 기반 협력로봇 개발

  • Kim, Yewon (Mechanical Engineering, Kyungpook National University) ;
  • Kang, Bo-Yeong (Mechanical Engineering, Kyungpook National University)
  • Received : 2020.09.07
  • Accepted : 2020.10.15
  • Published : 2020.11.30

Abstract

Typically everyday human life tasks involve at least two people moving objects such as tables and beds, and the balancing of such object changes based on one person's action. However, many studies in previous work performed their tasks solely on robots without factoring human cooperation. Therefore, in this paper, we propose cooperative robot for table balancing using Q-learning that enables cooperative work between human and robot. The human's action is recognized in order to balance the table by the proposed robot whose camera takes the image of the table's state, and it performs the table-balancing action according to the recognized human action without high performance equipment. The classification of human action uses a deep learning technology, specifically AlexNet, and has an accuracy of 96.9% over 10-fold cross-validation. The experiment of Q-learning was carried out over 2,000 episodes with 200 trials. The overall results of the proposed Q-learning show that the Q function stably converged at this number of episodes. This stable convergence determined Q-learning policies for the robot actions. Video of the robotic cooperation with human over the table balancing task using the proposed Q-Learning can be found at http://ibot.knu.ac.kr/videocooperation.html.

Keywords

References

  1. What's new, Atlas?, [Online], https://www.youtube.com/watch?v=fRj34o4hN4I, Accessed: Mar. 20, 2020.
  2. S. Wen, X. Chen, C. Ma, H. K. Lam, and S. Hua, "The Q-learning obstacle avoidance algorithm based on EKF-SLAM for NAO autonomous walking under unknown environments," Robotics and Autonomous Systems, vol. 72, pp. 29-36, Oct., 2015, DOI: 10.1016/j.robot.2015.04.003.
  3. M. Danel, "Reinforcement learning for humanoid robot control," POSTER 2017, Prague, Czech Republic, 2017, [Online], http://poseidon2.feld.cvut.cz/conf/poster/proceedings/Poster_2017/Section_IC/IC_021_Danel.pdf.
  4. F. Stulp, J. Buchli, E. Theodorou, and S. Schaal, "Reinforcement learning of full-body humanoid motor skills," 2020 10th IEEERAS International Conference on Humanoid Robots, Nashville, TN, USA, pp. 405-410, 2010, DOI: 10.1109/ICHR.2010.5686320.
  5. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, no. 7587, pp. 484-489, 2016, DOI: 10.1038/nature16961.
  6. K. Mulling, J. Kober, O. Kroemer, and J. Peters, "Learning to select and generalize striking movements in robot table tennis," The International Journal of Robotics Research, vol. 32, no. 3, pp. 263-279, 2013, DOI: 10.1177/0278364912472380.
  7. S. Debnath and J. Nassour, "Extending cortical-basal inspired reinforcement learning model with success-failure experience," 4th IEEE International Conference on Development and Learning and on Epigenetic Robotics, Genoa, Italy, pp. 293-298, 2014, DOI: 10.1109/DEVLRN.2014.6982996.
  8. O. Asik, B. Gorer, and H. L. Akin, "End-to-End Deep Imitation Learning: Robot Soccer Case Study," arXiv preprint arXiv:1807.09205, 2018, [Online], https://arxiv.org/abs/1807.09205.
  9. K. Lobos-Tsunekawa, F. Leiva, and J. Ruiz-Del-Solar, "Visual navigation for biped humanoid robots using deep reinforcement learning," IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3247-3254, Oct., 2018, DOI: 10.1109/LRA.2018.2851148.
  10. S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, "Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection," The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421-436, 2018, DOI: 10.1177/0278364917710318.
  11. P.-C. Yang, K. Sasaki, K. Suzuki, K. Kase, S. Sugano, and T. Ogata, "Repeatable Folding Task by Humanoid Robot Worker Using Deep Learning," IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 397-403, Apr., 2017, DOI: 10.1109/LRA.2016.2633383.
  12. C. Wang, K. V. Hindriks, and R. Babuska, "Active learning of affordances for robot use of household objects," 2014 IEEE-RAS International Conference on Humanoid Robots, Madrid, Spain, pp. 566-572, 2014, DOI: 10.1109/HUMANOIDS.2014.7041419.
  13. H. B. Suay and S. Chernova, "Effect of human guidance and state space size on Interactive Reinforcement Learning," 2011 IEEE International Symposium on Robot and Human Interactive Communication, Atlanta, GA, USA, pp. 1-6, 2011, DOI: 10.1109/ROMAN.2011.6005223.
  14. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Advances in Neural Information Processing Systems 25 (NIPS 2012), pp. 1097-1105, 2012, [Online], http://papers.nips.cc/paper/4824-imagenetclassification-with-deep-convolutional-neural-networ.
  15. J. Merel, Y. Tassa, D. TB, S. Srinivasan, J. Lemmon, Z. Wang, G. Wayne, and N. Heess, "Learning human behaviors from motion capture by adversarial imitation," arXiv preprint arXiv:1707.02201, 2017, [Online], https://arxiv.org/abs/1707.02201.
  16. X. B. Peng, G. Berseth, and M. Van De Panne, "Terrain-adaptive locomotion skills using deep reinforcement learning," ACM Transactions on Graphics, vol. 35, no. 4, pp. 1-12, 2016, DOI: 10.1145/2897824.2925881.
  17. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with Deep Reinforcement Learning," arXiv preprint arXiv:1312.5602, Dec., 2013, [Online], https://arxiv.org/abs/1312.5602.
  18. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015, [Online], https://arxiv.org/abs/1509.02971.
  19. J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, "Trust Region Policy Optimization," 32nd International Conference on Machine Learning, pp. 1889-1897, 2015, [Online], http://proceedings.mlr.press/v37/schulman15.html.
  20. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal Policy Optimization Algorithms," arXiv preprint arXiv: 1707.06347, 2017, [Online], https://arxiv.org/abs/1707.06347.
  21. A. Thobbi, Y. Gu, and W. Sheng, "Using human motion estimation for human-robot cooperative manipulation," 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA, pp. 2873-2878, 2011, DOI: 10.1109/iros.2011.6094904.
  22. Y. Gu, A. Thobbi, and W. Sheng, "Human-robot collaborative manipulation through imitation and reinforcement learning," 2011 IEEE International Conference on Information and Automation, Shenzhen, China, pp. 151-156, 2011, DOI: 10.1109/ICINFA.2011.5948979.
  23. Vicon, [Online], https://www.vicon.com, Accessed: Sep. 3, 2020.
  24. SoftBank Robotics, [Online], https://www.softbankrobotics.com, Accessed: Sep. 3, 2020.
  25. D. M. Powers, "Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation," Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37-63, Dec., 2011, [Online], http://hdl.handle.net/2328/27165.

Cited by

  1. 휴머노이드 로봇 HUMIC 개발 및 Gazebo 시뮬레이터를 이용한 강화학습 기반 로봇 행동 지능 연구 vol.16, pp.3, 2020, https://doi.org/10.7746/jkros.2021.16.3.260