DOI QR코드

DOI QR Code

정리정돈을 위한 Q-learning 기반의 작업계획기

Tidy-up Task Planner based on Q-learning

  • 투고 : 2020.11.03
  • 심사 : 2020.12.17
  • 발행 : 2021.02.26

초록

As the use of robots in service area increases, research has been conducted to replace human tasks in daily life with robots. Among them, this study focuses on the tidy-up task on a desk using a robot arm. The order in which tidy-up motions are carried out has a great impact on the success rate of the task. Therefore, in this study, a neural network-based method for determining the priority of the tidy-up motions from the input image is proposed. Reinforcement learning, which shows good performance in the sequential decision-making process, is used to train such a task planner. The training process is conducted in a virtual tidy-up environment that is configured the same as the actual tidy-up environment. To transfer the learning results in the virtual environment to the actual environment, the input image is preprocessed into a segmented image. In addition, the use of a neural network that excludes unnecessary tidy-up motions from the priority during the tidy-up operation increases the success rate of the task planner. Experiments were conducted in the real world to verify the proposed task planning method.

키워드

참고문헌

  1. M. R. Dogar and S. S. Srinivasa, "A planning framework for non-prehensile manipulation under clutter and uncertainty," Autonomous Robots, vol. 33, no. 3, pp. 217-236, 2012, DOI: 10.1007/s10514-012-9306-z.
  2. M. Fox and D. Long, "PDDL2. 1: An extension to PDDL for expressing temporal planning domains," Journal of Artificial Intelligence Research, vol. 20, pp. 61-124, 2003, DOI: 10.1613/jair.1129.
  3. S. Srivastava, E. Fang, L. Riano, R. Chitnis, S. Russell, and P. Abbeel, "Combined task and motion planning through an extensible planner-independent interface layer," 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, pp. 639-646, 2014, DOI: 10.1109/ICRA. 2014.6906922.
  4. R. Munos, "From bandits to Monte-Carlo Tree Search: The optimistic principle applied to optimization and planning," Foundations and Trends R in Machine Learning, vol. 7, no. 1, pp. 1-129, 2014, [Online], https://hal.archives-ouvertes.fr/hal00747575/. https://doi.org/10.1561/2200000038
  5. Y. Labbe, S. Zagoruyko, I. Kalevatykh, I. Laptev, J. Carpentier, M. Aubry, and J. Sivic, "Monte-carlo tree search for efficient visually guided rearrangement planning," IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3715-3722, April, 2020, DOI: 10.1109/LRA.2020.2980984.
  6. P. Christiano, Z. Shah, I. Mordatch, J. Schneider, T. Blackwell, J. Tobin, P. Abbeel, and W. Zaremba, "Transfer from simulation to real world through learning deep inverse dynamics model," arXiv preprint arXiv:1610.03518, 2016, [Online], https://arxiv.org/abs/1610.03518.
  7. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, 2017, DOI: 10.1109/TPAMI.2017.2699184.
  8. T. Schaul, D. Horgan, K. Gregor, and D. Silver, "Universal value function approximators," The 32nd International Conference on Machine Learning, pp. 1312-1320, 2015, [Online], http://proceedings.mlr.press/v37/schaul15.html.
  9. M. Hessel, J. Modayil, H. van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar, and D. Silver, "Rainbow: Combining improvements in deep reinforcement learning," arXiv preprint arXiv:1710.02298, 2017, [Online], https://arxiv.org/abs/1710.02298.
  10. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, 2015, DOI: 10.1038/nature14236.
  11. R. Dearden, N. Friedman, and S. Russell, "Bayesian Q-learning," Innovative Applications of Artificial Intelligence Conference, pp. 761-768, 1998, [Online], https://www.aaai.org/Papers/AAAI/1998/AAAI98-108.pdf.
  12. H. Hasselt, "Double Q-learning," Advances in Neural Information Processing Systems 23 (NIPS 2010), pp. 2613-2621, 2010, [Online], https://papers.nips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html.
  13. T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized experience replay," arXiv preprint arXiv:1511.05952, 2015, [Online], https://arxiv.org/abs/1511.05952.
  14. Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, "Dueling network architectures for deep reinforcement learning," The 33rd International Conference on Machine Learning, pp. 1995-2003, 2016, [Online], http://proceedings.mlr.press/v48/wangf16.html.
  15. M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves, V. Mnih, R. Munos, D. Hassabis, O. Pietquin, C. Blundell, and S. Legg, "Noisy networks for exploration," arXiv preprint arXiv:1706.10295, 2017, [Online], https://arxiv.org/abs/1706.10295.
  16. R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, no. 1, pp. 9-44, 1988, DOI: 10.1007/BF00115009.
  17. M. G. Bellemare, W. Dabney, and R. Munos, "A distributional perspective on reinforcement learning," arXiv preprint arXiv: 1707.06887, 2017, [Online], https://arxiv.org/abs/1707.06887.
  18. X. Zhu, and A. B. Goldberg, "Introduction to semi-supervised learning," Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 3, no. 1, pp. 1-130, 2009, DOI: 10.2200/S00196ED1V01Y200906AIM006.
  19. D. M. Allen, "Mean square error of prediction as a criterion for selecting variables," Technometrics, vol. 13, no. 3, pp. 469-475, 1971, [Online], https://amstat.tandfonline.com/doi/citedby/10.1080/00401706.1971.10488811?scroll=top&needAccess=true#.X9po-NgzaUk.
  20. I. V. Tetko, D. J. Livingstone, and A. I. Luik, "Neural network studies. 1. Comparison of overfitting and overtraining," Journal of Chemical Information and Computer Sciences, vol. 35, no. 5, pp. 826-833, 1995, DOI: 10.1021/ci00027a006.