[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7746/jkros.2019.14.1.040

Implementation of End-to-End Training of Deep Visuomotor Policies for Manipulation of a Robotic Arm of Baxter Research Robot

Kim, Seongun (UNIST)
Kim, Sol A (UNIST)
de Lima, Rafael (UNIST)
Choi, Jaesik (Computer Engineering, UNIST)

Publication Information

The Journal of Korea Robotics Society / v.14, no.1, 2019 , pp. 40-49 More about this Journal

Abstract

Reinforcement learning has been applied to various problems in robotics. However, it was still hard to train complex robotic manipulation tasks since there is a few models which can be applicable to general tasks. Such general models require a lot of training episodes. In these reasons, deep neural networks which have shown to be good function approximators have not been actively used for robot manipulation task. Recently, some of these challenges are solved by a set of methods, such as Guided Policy Search, which guide or limit search directions while training of a deep neural network based policy model. These frameworks are already applied to a humanoid robot, PR2. However, in robotics, it is not trivial to adjust existing algorithms designed for one robot to another robot. In this paper, we present our implementation of Guided Policy Search to the robotic arms of the Baxter Research Robot. To meet the goals and needs of the project, we build on an existing implementation of Baxter Agent class for the Guided Policy Search algorithm code using the built-in Python interface. This work is expected to play an important role in popularizing robot manipulation reinforcement learning methods on cost-effective robot platforms.

Keywords

Robotics; Visuomotor Policy; Reinforcement Learning; Guided Policy Search; Baxter Research Robot;

Citations & Related Records

Reference

1	D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, "Mastering the game of go with deep neural networks and tree search," Nature, vol. 529, no. 7587, pp. 484-489, January, 2016. DOI
2	D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. Van Den Driessche, T. Graepel, and D. Hassabis, "Mastering the game of go without human knowledge," Nature, vol. 550, no. 7676, pp. 354-359, October, 2017. DOI
3	K. Lee, S.-A. Kim, J. Choi, S.-W. Lee, "Deep reinforcement learning in continuous action spaces: a case study in the game of simulated curling," 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, pp. 2937-2946, 2018.
4	K. Arulkumaran, M.P. Deisenroth, M. Brundage, and A.A. Bharath, "Deep reinforcement learning: A brief survey," IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26-38, November, 2017. DOI
5	J. Kober and J. Peters, "Reinforcement learning in robotics: A survey," Learning Motor Skills, Springer, 2014, ch. 2, pp. 9-67.
6	V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, February, 2015. DOI
7	T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv:1509.02971 [cs.LG], 2015.
8	S. Levine and V. Koltun, "Guided policy search," 30th International Conference on Machine Learning (ICML), Atlanta, Georgia, USA, pp. 1-9, 2013.
9	C. Finn, Guided policy search, [Online], https://github.com/cbfinn/gps, Accessed: January 14, 2019.
10	P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, "Deep reinforcement learning that matters," arXiv:1709.06560 [cs.LG], 2017.
11	C. Finn, X.Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel, "Deep spatial autoencoders for visuomotor learning," 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, pp. 512-519, 2016.
12	Y. Tsurumine, Y. Cui, E. Uchibe, and T. Matsubara, "Deep dynamic policy programming for robot control with raw images," 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, pp. 1545-1550, 2017.
13	Y. Chebotar, K. Hausman, M. Zhang, G. Sukhatme, S. Schaal, and S. Levine, "Combining model-based and model-free updates for trajectory-centric reinforcement learning," 34th International Conference on Machine Learning (ICML), Sydney, Australia, pp. 703-711, 2017.
14	S. Levine, C. Finn, T. Darrell, and P. Abbeel, "End-to-end training of deep visuomotor policies," Journal of Machine Learning Research (JMLR), vol. 17, no. 39, pp. 1-40, January, 2016.
15	W. Montgomery and S. Levine, "Guided policy search via approximate mirror descent," Advances in Neural Information Processing Systems (NIPS), Barcelona, Spain, pp. 4008-4016, 2016.
16	H. Wang and A. Banerjee, "Bregman alternating direction method of multipliers," Advances in Neural Information Processing Systems (NIPS), Montreal, Canada, pp. 2816-2824, 2014.

KSCI

Implementation of End-to-End Training of Deep Visuomotor Policies for Manipulation of a Robotic Arm of Baxter Research Robot 백스터 로봇의 시각기반 로봇 팔 조작 딥러닝을 위한 강화학습 알고리즘 구현

Implementation of End-to-End Training of Deep Visuomotor Policies for Manipulation of a Robotic Arm of Baxter Research Robot