[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7746/jkros.2022.17.1.040

Visual Object Manipulation Based on Exploration Guided by Demonstration

Kim, Doo-Jun (Mechanical Engineering, Korea University)
Jo, HyunJun (Mechanical Engineering, Korea University)
Song, Jae-Bok (Mechanical Engineering, Korea University)

Publication Information

The Journal of Korea Robotics Society / v.17, no.1, 2022 , pp. 40-47 More about this Journal

Abstract

A reward function suitable for a task is required to manipulate objects through reinforcement learning. However, it is difficult to design the reward function if the ample information of the objects cannot be obtained. In this study, a demonstration-based object manipulation algorithm called stochastic exploration guided by demonstration (SEGD) is proposed to solve the design problem of the reward function. SEGD is a reinforcement learning algorithm in which a sparse reward explorer (SRE) and an interpolated policy using demonstration (IPD) are added to soft actor-critic (SAC). SRE ensures the training of the critic of SAC by collecting prior data and IPD limits the exploration space by making SEGD's action similar to the expert's action. Through these two algorithms, the SEGD can learn only with the sparse reward of the task without designing the reward function. In order to verify the SEGD, experiments were conducted for three tasks. SEGD showed its effectiveness by showing success rates of more than 96.5% in these experiments.

Keywords

Imitation Learning; Manipulation; Variational Autoencoder; Reinforcement Learning;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, A. Sendonaris, G. Dulac-Arnold, I. Osband, J. Agapiou, J. Z. Leibo, and A. Gruslys, "Learning from demonstrations for real world reinforcement learning," arXiv preprint arXiv:1704.03732, 2017, [Online], https://openreview.net/forum?id=5AvKPhXBsz.
2	S. Gu, E. Holly, T. Lillicrap, and S. Levine, "Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates," 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 2017, DOI: 10.1109/ICRA.2017.7989385. DOI
3	A. Y. Ng and S. Russell, "Algorithms for inverse reinforcement learning," 17th International Conference on Machine Learning (ICML), 2000, [Online], https://ai.stanford.edu/~ang/papers/icml00-irl.pdf.
4	G. Schoettler, A. Nair, J. Luo, J. Luo, S. Bahl, J. A. Ojea, E. Solowjow, and S. Levine, "Deep reinforcement learning for industrial insertion tasks with visual inputs and natural rewards," International Conference on Intelligent Robots and Systems, 2019, [Online], https://arxiv.org/pdf/1906.05841.pdf.
5	X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, "Deepmimic: Example-guided deep reinforcement learning of physics-based character skills," ACM SIGGRAPH, 2018, [Online], https://dl.acm.org/doi/pdf/10.1145/3197517.3201311. DOI
6	A. Rajeswaran, V. Kumar, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, and S. Levine, "Learning complex dexterous manipulation with deep reinforcement learning and demonstrations," arXiv:1709.10087v2 [cs.LG], 2018, [Online], https://arxiv.org/pdf/1709.10087.pdf.
7	A. Nair, B. McGrew, M. Andrychowicz, W. Zaremba, and P. Abbeel, "Overcoming exploration in reinforcement learning with demonstrations," 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 2017, DOI: 10.1109/ICRA.2018.8463162. DOI
8	A. Singh, L. Yang, K. Hartikainen, C. Finn, and S. Levine, "End-toEnd Robotic Reinforcement Learning without Reward Engineering," arXiv:1904.07854 [cs.LG], 2019. [Online], https://arxiv.org/pdf/1904.07854.pdf.
9	P. Sermanet, K. Xu, and S. Levine, "Unsupervised perceptual rewards for imitation learning," arXiv:1612.06699v3 [cs.CV], 2017, [Online], https://arxiv.org/pdf/1612.06699.pdf.
10	L. Smith, N. Dhawan, M. Zhang, P. Abbeel, and S. Levine, "AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos," arXiv:1912.04443v3 [cs.RO], 2019, [Online], https://arxiv.org/pdf/1912.04443.pdf.
11	A. Kumar, J. Fu, M. Soh, G. Tucker, and S. Levine, "Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction," Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2019, [Online], https://proceedings.neurips.cc/paper/2019/hash/c2073ffa77b5357a498057413bb09d3a-Abstract.html.
12	T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor," Machine Learning Research, pp. 1861-1870, 2018, [Online], https://proceedings.mlr.press/v80/haarnoja18b.
13	S. Schaal, A. Ijspeert, and A. Billard, "Computational approaches to motor learning by imitation," Philosophical Transactions: Biological Sciences, 2003, DOI: 10.1098/rstb.2002.1258. DOI
14	C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, and A. Lerchner, "Understanding disentangling in beta-vae," arXiv:1804.03599v1 [stat.ML], 2017, [Online], https://arxiv.org/pdf/1804.03599.pdf.

KSCI

Visual Object Manipulation Based on Exploration Guided by Demonstration 시연에 의해 유도된 탐험을 통한 시각 기반의 물체 조작

Visual Object Manipulation Based on Exploration Guided by Demonstration