Browse > Article
http://dx.doi.org/10.7746/jkros.2022.17.1.040

Visual Object Manipulation Based on Exploration Guided by Demonstration  

Kim, Doo-Jun (Mechanical Engineering, Korea University)
Jo, HyunJun (Mechanical Engineering, Korea University)
Song, Jae-Bok (Mechanical Engineering, Korea University)
Publication Information
The Journal of Korea Robotics Society / v.17, no.1, 2022 , pp. 40-47 More about this Journal
Abstract
A reward function suitable for a task is required to manipulate objects through reinforcement learning. However, it is difficult to design the reward function if the ample information of the objects cannot be obtained. In this study, a demonstration-based object manipulation algorithm called stochastic exploration guided by demonstration (SEGD) is proposed to solve the design problem of the reward function. SEGD is a reinforcement learning algorithm in which a sparse reward explorer (SRE) and an interpolated policy using demonstration (IPD) are added to soft actor-critic (SAC). SRE ensures the training of the critic of SAC by collecting prior data and IPD limits the exploration space by making SEGD's action similar to the expert's action. Through these two algorithms, the SEGD can learn only with the sparse reward of the task without designing the reward function. In order to verify the SEGD, experiments were conducted for three tasks. SEGD showed its effectiveness by showing success rates of more than 96.5% in these experiments.
Keywords
Imitation Learning; Manipulation; Variational Autoencoder; Reinforcement Learning;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, A. Sendonaris, G. Dulac-Arnold, I. Osband, J. Agapiou, J. Z. Leibo, and A. Gruslys, "Learning from demonstrations for real world reinforcement learning," arXiv preprint arXiv:1704.03732, 2017, [Online], https://openreview.net/forum?id=5AvKPhXBsz.
2 S. Gu, E. Holly, T. Lillicrap, and S. Levine, "Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates," 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 2017, DOI: 10.1109/ICRA.2017.7989385.   DOI
3 A. Y. Ng and S. Russell, "Algorithms for inverse reinforcement learning," 17th International Conference on Machine Learning (ICML), 2000, [Online], https://ai.stanford.edu/~ang/papers/icml00-irl.pdf.
4 G. Schoettler, A. Nair, J. Luo, J. Luo, S. Bahl, J. A. Ojea, E. Solowjow, and S. Levine, "Deep reinforcement learning for industrial insertion tasks with visual inputs and natural rewards," International Conference on Intelligent Robots and Systems, 2019, [Online], https://arxiv.org/pdf/1906.05841.pdf.
5 X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, "Deepmimic: Example-guided deep reinforcement learning of physics-based character skills," ACM SIGGRAPH, 2018, [Online], https://dl.acm.org/doi/pdf/10.1145/3197517.3201311.   DOI
6 A. Rajeswaran, V. Kumar, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, and S. Levine, "Learning complex dexterous manipulation with deep reinforcement learning and demonstrations," arXiv:1709.10087v2 [cs.LG], 2018, [Online], https://arxiv.org/pdf/1709.10087.pdf.
7 A. Nair, B. McGrew, M. Andrychowicz, W. Zaremba, and P. Abbeel, "Overcoming exploration in reinforcement learning with demonstrations," 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 2017, DOI: 10.1109/ICRA.2018.8463162.   DOI
8 A. Singh, L. Yang, K. Hartikainen, C. Finn, and S. Levine, "End-toEnd Robotic Reinforcement Learning without Reward Engineering," arXiv:1904.07854 [cs.LG], 2019. [Online], https://arxiv.org/pdf/1904.07854.pdf.
9 P. Sermanet, K. Xu, and S. Levine, "Unsupervised perceptual rewards for imitation learning," arXiv:1612.06699v3 [cs.CV], 2017, [Online], https://arxiv.org/pdf/1612.06699.pdf.
10 L. Smith, N. Dhawan, M. Zhang, P. Abbeel, and S. Levine, "AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos," arXiv:1912.04443v3 [cs.RO], 2019, [Online], https://arxiv.org/pdf/1912.04443.pdf.
11 A. Kumar, J. Fu, M. Soh, G. Tucker, and S. Levine, "Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction," Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2019, [Online], https://proceedings.neurips.cc/paper/2019/hash/c2073ffa77b5357a498057413bb09d3a-Abstract.html.
12 T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor," Machine Learning Research, pp. 1861-1870, 2018, [Online], https://proceedings.mlr.press/v80/haarnoja18b.
13 S. Schaal, A. Ijspeert, and A. Billard, "Computational approaches to motor learning by imitation," Philosophical Transactions: Biological Sciences, 2003, DOI: 10.1098/rstb.2002.1258.   DOI
14 C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, and A. Lerchner, "Understanding disentangling in beta-vae," arXiv:1804.03599v1 [stat.ML], 2017, [Online], https://arxiv.org/pdf/1804.03599.pdf.