DOI QR코드

DOI QR Code

Deep Reinforcement Learning of Ball Throwing Robot's Policy Prediction

공 던지기 로봇의 정책 예측 심층 강화학습

  • Received : 2020.09.02
  • Accepted : 2020.09.22
  • Published : 2020.11.30

Abstract

Robot's throwing control is difficult to accurately calculate because of air resistance and rotational inertia, etc. This complexity can be solved by using machine learning. Reinforcement learning using reward function puts limit on adapting to new environment for robots. Therefore, this paper applied deep reinforcement learning using neural network without reward function. Throwing is evaluated as a success or failure. AI network learns by taking the target position and control policy as input and yielding the evaluation as output. Then, the task is carried out by predicting the success probability according to the target location and control policy and searching the policy with the highest probability. Repeating this task can result in performance improvements as data accumulates. And this model can even predict tasks that were not previously attempted which means it is an universally applicable learning model for any new environment. According to the data results from 520 experiments, this learning model guarantees 75% success rate.

Keywords

References

  1. U. Jung, J. M. Park, S. H. Yu, H. Lee, S. Ahn, and Y. Lee, "Design of an Efficient Robotic Arm Model for Accurate Throwing," The Korean Association of Computer Education, vol. 19, no. 2, pp. 121-125, 2015, UCI(KEPA): I410-ECN-0101-2018-037-003283830.
  2. E. W. Aboaf, C. G. Atkeson, and D. J. Reinkensmeyer, "TaskLevel Robot Learning," 1988 IEEE International Conference on Robotics and Automation, Philadelphia, PA, USA, 1988, DOI: 10.1109/ROBOT.1988.12245.
  3. A. Zeng, S. Song, J. Lee, A. Rodriguez, and T. Funkhouser, "Tossingbot: Learning to Throw Arbitrary Objects with Residual Physics," IEEE Transactions on Robotics, vol. 36, no. 4, Aug., 2020, DOI: 10.1109/TRO.2020.2988642.
  4. A. Ghadirzadeh, A. Maki, D. Kragic, and M. Bjorkman, "Deep Predictive Policy Training using Reinforcement Learning," 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 2017, DOI: 10.1109/IROS.2017.8206046.
  5. J. Kober, J. A. Bagnell, and J. Peters, "Reinforcement Learning in Robotics: A Survey," The International Journal of Robotics Research, vol. 32, no. 11, 2013, DOI: 10.1177/0278364913495721.
  6. E. Biyik and D. Sadigh. "Batch Active Preference-Based Learning of Reward Functions," arXiv:1810. 04303, 2018, [Online], https://arxiv.org/abs/1810.043 03.
  7. J. Kober, A. Wilhelm, E. Oztop, and J. Peters, "Reinforcement Learning to Adjust Robot Movements to New Situations," Autonomous Robots, vol. 33, 2011, DOI: 10.1007/s10514-012-9290-3.
  8. J. Y. Park, G. B. Jeong, and Y. J. Moon, "Performance Comparison of Crawling Robots Trained by Reinforcement Learning Methods," Korean Institute of Intelligent Systems, vol. 17, no. 1, pp. 33-36, 2007, http://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE01019878.
  9. Y.-H, Yang, S.-H. Lee, and C.-S. Lee, "Designing an Efficient Reward Function for Robot Reinforcement Learning of The Water Bottle Flipping Task," The Journal of Korea Robotics Society, vol. 14, no. 2, June, 2019, DOI: 10.7746/jkros.2019.14.2.081.
  10. P. Abbeel, M. Quigley, and A. Y. Ng, "Using Inaccurate Models in Reinforcement Learning," The 23rd international conference on Machine learning, 2006, DOI: 10.1145/1143844.1143845.
  11. H. Yokota, S. Ohshima, and N. Mizuno, "Information visualisation of Optimised Underhand Throw for Cybernetic Training," Procedia Engineering, vol. 112, 2015, DOI: 10.1016/j.proeng.2015.07.239.
  12. J. Kober and J. Peters, "Policy Search for Motor Primitives in Robotics," Springer Tracts in Advanced Robotics, vol. 97, 2009, DOI: 10.1007/978-3-319-03194-1_4.
  13. S. Jung and T. C. Hsia, "A New Neural Network Control Technique for Robot Manipulators," 1995 American Control Conference-ACC'95, Seattle, WA, USA, 1995, DOI: 10.1109/ACC.1995.529374.