Effective Policy Search Method for Robot Reinforcement Learning with Noisy Reward

Yang, Young-Ha;Lee, Cheol-Soo;

doi:10.7746/jkros.2022.17.1.001

The Journal of Korea Robotics Society (로봇학회논문지)

Volume 17 Issue 1
/
Pages.1-7
/
2022
/
1975-6291(pISSN)
/
2287-3961(eISSN)

Korea Robotics Society (한국로봇학회)

DOI QR Code

Effective Policy Search Method for Robot Reinforcement Learning with Noisy Reward

노이즈 환경에서 효과적인 로봇 강화 학습의 정책 탐색 방법

Yang, Young-Ha (Mechanical Engineering, Sogang University) ;
Lee, Cheol-Soo (Mechanical Engineering, Sogang University)

양영하 ;
이철수

Received : 2020.07.20
Accepted : 2022.02.10
Published : 2022.02.28

https://doi.org/10.7746/jkros.2022.17.1.001 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Robots are widely used in industries and services. Traditional robots have been used to perform repetitive tasks in a fixed environment, and it is very difficult to solve a problem in which the physical interaction of the surrounding environment or other objects is complicated with the existing control method. Reinforcement learning has been actively studied as a method of machine learning to solve such problems, and provides answers to problems that robots have not solved in the conventional way. Studies on the learning of all physical robots are commonly affected by noise. Complex noises, such as control errors of robots, limitations in performance of measurement equipment, and complexity of physical interactions with surrounding environments and objects, can act as factors that degrade learning. A learning method that works well in a virtual environment may not very effective in a real robot. Therefore, this paper proposes a weighted sum method and a linear regression method as an effective and accurate learning method in a noisy environment. In addition, the bottle flipping was trained on a robot and compared with the existing learning method, the validity of the proposed method was verified.

Keywords

Acknowledgement

This project was funded by Sogang University Research & Business Development Foundation

References

N. Kohl and P. Stone, "Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion," 2004 IEEE International Conference on Robotics & Automation, New Orleans, LA, USA, pp. 2619-2624, 2004. DOI: 10.1109/ROBOT.2004.1307456.
M. T. Rosenstein and A. G. Barto, "Robot Weightlifting By Direct Policy Search," 2001 International Joint Conference on Artificial Intelligence, Seattle, USA, pp. 839-844, 2001, [Online], https://dl.acm.org/doi/abs/10.5555/1642194.1642206.
J. Kober and J. Peters, "Policy Search for Motor Primitives in Robotics," Machine Learning, vol. 84, pp. 171-203, 2011, DOI: 10.1007/s10994-010-5223-6.
P. Kormushev, S. Calinon, R. Saegusa, and G. Metta, "Learning the skill of archery by a humanoid robot iCub," 2010 10th IEEE-RAS International Conference on Humanoid Robots, Nashville, TN, USA, pp. 417-423, 2010, DOI: 10.1109/ICHR.2010.5686841.
D. H. Kang, J. H. Bong, J. Park, and S. Park, "Reinforcement Learning Strategy for Automatic Control of Real-time Obstacle Avoidance based on Vehicle Dynamics," Journal of Korea Robotics Society, vol. 12, no. 3, pp. 297-305, Sept., 2017, DOI: 10.7746/jkros.2017.12.3.297.
R. S. Sutton and A. G. Barto, "Introduction," Reinforcement Learning: An Introduction, 2nd ed. The MIT Press, 2014, ch. 1, sec. 1-7, pp.1-18, [Online], https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf.
M. P. Deisenroth, G. Neumann, and J. Peters, "A Survey on Policy Search for Robotics," Foundation and Trends® in Robotics, vol. 2, no. 1-2, pp. 1-142, 2013, DOI: 10.1561/2300000021.
Y. H. Yang, S. H. Lee, and C. S. Lee, "Designing an Efficient Reward Function for Robot Reinforcement Learning of The Water Bottle Flipping Task," Journal of Korea Robotics Society, vol. 14, no. 2, pp. 81-86, Jun., 2019, DOI: 10.7746/jkros.2019.14.2.081.
P. Abbeel, M. Quigley, and A. Y. Ng, "Using Inaccurate Models in Reinforcement Learning," 23rd International Conference on Machine Learning, Pittsburgh, Pennsylvania, USA, pp. 1-8, 2006, DOI: 10.1145/1143844.1143845.
M. J. Mataric, "Reward Functions for Accelerated Learning," Eleventh International Conference, Brunswick, NJ, USA, pp. 181-189, 1994, DOI: 10.1016/B978-1-55860-335-6.50030-1.
H. Hachiya, J. Peters, and M. Sugiyama, "Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning," Neural Computation, vol. 23, no. 11, pp. 2798-2832, 2011, DOI: 10.1162/NECO_a_00199.
B. C. da Silva, G. Baldassarre, G. Konidaris, and A. Barto, "Learning parameterized motor skills on a humanoid robot," 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, pp. 5239-5244, 2014, DOI: 10.1109/ICRA.2014.6907629.
S. H. Lee, "Designing an efficient reward function for robot reinforcement learning of the water bottle flipping task," M.S thesis, Sogang University, Seoul, Korea, 2018, [Online], https://library.sogang.ac.kr/search/detail/CAT000000843771.
J. Kober and J. Peters, "Learning Motor Primitives for Robotics," 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, pp. 2112-2118, 2009, DOI: 10.1109/ROBOT.2009.5152577.
J. Wang, Y. Liu, and B. Li, "Reinforcement Learning with Perturbed Rewards," AAAI Technical Track: Machine Learning, 2020, DOI: 10.1609/aaai.v34i04.6086.
K. Framling, "Reinforcement Learning in a Noisy Environment: Light-Seeking Robot," WSEAS Transactions on Systems, vol. 3, no. 2, pp. 714-719, 2004, [Online], https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.484.6001&rep=rep1&type=pdf.

The Journal of Korea Robotics Society (로봇학회논문지)

Effective Policy Search Method for Robot Reinforcement Learning with Noisy Reward

노이즈 환경에서 효과적인 로봇 강화 학습의 정책 탐색 방법

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)