DOI QR코드

DOI QR Code

노이즈 환경에서 효과적인 로봇 강화 학습의 정책 탐색 방법

Effective Policy Search Method for Robot Reinforcement Learning with Noisy Reward

  • 투고 : 2020.07.20
  • 심사 : 2022.02.10
  • 발행 : 2022.02.28

초록

Robots are widely used in industries and services. Traditional robots have been used to perform repetitive tasks in a fixed environment, and it is very difficult to solve a problem in which the physical interaction of the surrounding environment or other objects is complicated with the existing control method. Reinforcement learning has been actively studied as a method of machine learning to solve such problems, and provides answers to problems that robots have not solved in the conventional way. Studies on the learning of all physical robots are commonly affected by noise. Complex noises, such as control errors of robots, limitations in performance of measurement equipment, and complexity of physical interactions with surrounding environments and objects, can act as factors that degrade learning. A learning method that works well in a virtual environment may not very effective in a real robot. Therefore, this paper proposes a weighted sum method and a linear regression method as an effective and accurate learning method in a noisy environment. In addition, the bottle flipping was trained on a robot and compared with the existing learning method, the validity of the proposed method was verified.

키워드

과제정보

This project was funded by Sogang University Research & Business Development Foundation

참고문헌

  1. N. Kohl and P. Stone, "Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion," 2004 IEEE International Conference on Robotics & Automation, New Orleans, LA, USA, pp. 2619-2624, 2004. DOI: 10.1109/ROBOT.2004.1307456.
  2. M. T. Rosenstein and A. G. Barto, "Robot Weightlifting By Direct Policy Search," 2001 International Joint Conference on Artificial Intelligence, Seattle, USA, pp. 839-844, 2001, [Online], https://dl.acm.org/doi/abs/10.5555/1642194.1642206.
  3. J. Kober and J. Peters, "Policy Search for Motor Primitives in Robotics," Machine Learning, vol. 84, pp. 171-203, 2011, DOI: 10.1007/s10994-010-5223-6.
  4. P. Kormushev, S. Calinon, R. Saegusa, and G. Metta, "Learning the skill of archery by a humanoid robot iCub," 2010 10th IEEE-RAS International Conference on Humanoid Robots, Nashville, TN, USA, pp. 417-423, 2010, DOI: 10.1109/ICHR.2010.5686841.
  5. D. H. Kang, J. H. Bong, J. Park, and S. Park, "Reinforcement Learning Strategy for Automatic Control of Real-time Obstacle Avoidance based on Vehicle Dynamics," Journal of Korea Robotics Society, vol. 12, no. 3, pp. 297-305, Sept., 2017, DOI: 10.7746/jkros.2017.12.3.297.
  6. R. S. Sutton and A. G. Barto, "Introduction," Reinforcement Learning: An Introduction, 2nd ed. The MIT Press, 2014, ch. 1, sec. 1-7, pp.1-18, [Online], https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf.
  7. M. P. Deisenroth, G. Neumann, and J. Peters, "A Survey on Policy Search for Robotics," Foundation and Trends® in Robotics, vol. 2, no. 1-2, pp. 1-142, 2013, DOI: 10.1561/2300000021.
  8. Y. H. Yang, S. H. Lee, and C. S. Lee, "Designing an Efficient Reward Function for Robot Reinforcement Learning of The Water Bottle Flipping Task," Journal of Korea Robotics Society, vol. 14, no. 2, pp. 81-86, Jun., 2019, DOI: 10.7746/jkros.2019.14.2.081.
  9. P. Abbeel, M. Quigley, and A. Y. Ng, "Using Inaccurate Models in Reinforcement Learning," 23rd International Conference on Machine Learning, Pittsburgh, Pennsylvania, USA, pp. 1-8, 2006, DOI: 10.1145/1143844.1143845.
  10. M. J. Mataric, "Reward Functions for Accelerated Learning," Eleventh International Conference, Brunswick, NJ, USA, pp. 181-189, 1994, DOI: 10.1016/B978-1-55860-335-6.50030-1.
  11. H. Hachiya, J. Peters, and M. Sugiyama, "Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning," Neural Computation, vol. 23, no. 11, pp. 2798-2832, 2011, DOI: 10.1162/NECO_a_00199.
  12. B. C. da Silva, G. Baldassarre, G. Konidaris, and A. Barto, "Learning parameterized motor skills on a humanoid robot," 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, pp. 5239-5244, 2014, DOI: 10.1109/ICRA.2014.6907629.
  13. S. H. Lee, "Designing an efficient reward function for robot reinforcement learning of the water bottle flipping task," M.S thesis, Sogang University, Seoul, Korea, 2018, [Online], https://library.sogang.ac.kr/search/detail/CAT000000843771.
  14. J. Kober and J. Peters, "Learning Motor Primitives for Robotics," 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, pp. 2112-2118, 2009, DOI: 10.1109/ROBOT.2009.5152577.
  15. J. Wang, Y. Liu, and B. Li, "Reinforcement Learning with Perturbed Rewards," AAAI Technical Track: Machine Learning, 2020, DOI: 10.1609/aaai.v34i04.6086.
  16. K. Framling, "Reinforcement Learning in a Noisy Environment: Light-Seeking Robot," WSEAS Transactions on Systems, vol. 3, no. 2, pp. 714-719, 2004, [Online], https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.484.6001&rep=rep1&type=pdf.