DOI QR코드

DOI QR Code

Effective Utilization of Domain Knowledge for Relational Reinforcement Learning

관계형 강화 학습을 위한 도메인 지식의 효과적인 활용

  • Received : 2021.07.14
  • Accepted : 2021.08.29
  • Published : 2022.03.31

Abstract

Recently, reinforcement learning combined with deep neural network technology has achieved remarkable success in various fields such as board games such as Go and chess, computer games such as Atari and StartCraft, and robot object manipulation tasks. However, such deep reinforcement learning describes states, actions, and policies in vector representation. Therefore, the existing deep reinforcement learning has some limitations in generality and interpretability of the learned policy, and it is difficult to effectively incorporate domain knowledge into policy learning. On the other hand, dNL-RRL, a new relational reinforcement learning framework proposed to solve these problems, uses a kind of vector representation for sensor input data and lower-level motion control as in the existing deep reinforcement learning. However, for states, actions, and learned policies, It uses a relational representation with logic predicates and rules. In this paper, we present dNL-RRL-based policy learning for transportation mobile robots in a manufacturing environment. In particular, this study proposes a effective method to utilize the prior domain knowledge of human experts to improve the efficiency of relational reinforcement learning. Through various experiments, we demonstrate the performance improvement of the relational reinforcement learning by using domain knowledge as proposed in this paper.

최근 들어 강화 학습은 심층 신경망 기술과 결합되어 바둑, 체스와 같은 보드 게임, Atari, StartCraft와 같은 컴퓨터 게임, 로봇 물체 조작 작업 등과 같은 다양한 분야에서 매우 놀라운 성공을 거두었다. 하지만 이러한 심층 강화 학습은 행동, 상태, 정책 등을 모두 벡터 형태로 표현한다. 따라서 기존의 심층 강화 학습은 학습된 정책의 해석 가능성과 일반성에 제한이 있고, 도메인 지식을 학습에 효과적으로 활용하기도 어렵다는 한계성이 있다. 이러한 한계점들을 해결하기 위해 제안된 새로운 관계형 강화 학습 프레임워크인 dNL-RRL은 센서 입력 데이터와 행동 실행 제어는 기존의 심층 강화 학습과 마찬가지로 벡터 표현을 이용하지만, 행동, 상태, 그리고 학습된 정책은 모두 논리 서술자와 규칙들로 나타내는 관계형 표현을 이용한다. 본 논문에서는 dNL-RRL 관계형 강화 학습 프레임워크를 이용하여 제조 환경 내에서 운송용 모바일 로봇을 위한 행동 정책 학습을 수행하는 효과적인 방법을 제시한다. 특히 본 연구에서는 관계형 강화 학습의 효율성을 높이기 위해, 인간 전문가의 사전 도메인 지식을 활용하는 방안들을 제안한다. 여러 가지 실험들을 통해, 본 논문에서 제안하는 도메인 지식을 활용한 관계형 강화 학습 프레임워크의 성능 개선 효과를 입증한다.

Keywords

References

  1. H. Dong, J. Mao, T. Lin, C. Wang, L. Li, and D. Zhou, "Neural logic machines," Proceedings of International Conference on Learning Representations (ICLR), 2019.
  2. V. Zambaldi, et al., "Relational deep reinforcement learning," arXiv preprint arXiv:1806.01830, 2018.
  3. Z. Jiang and S. Luo, "Neural logic reinforcement learning," Proceedings of International Conference on Machine Learning (ICML), 2019.
  4. A. Payani and F. Fekri, "Incorporating relational background knowledge into reinforcement learning via differentiable inductive logic programming," arXiv preprint arXiv:2003.10386, 2020.
  5. A. Payani and F. Fekri, "Inductive logic programming via differentiable deep neural logic networks," arXiv preprint arXiv:1906.03523, 2019.
  6. J. Janisch, T. Pevny, and V. Lisy, "Symbolic relational deep reinforcement learning based on graph neural networks," arXiv preprint arXiv:2009.12462, 2020.
  7. S. Garg and A. Bajpai, "Symbolic network: Generalized neural policies for relational MDPs," Proceedings of International Conference on Machine Learning (ICML), pp.3397-3407, Nov. 2020.
  8. O. Rivlin, T. Hazan, and E. Karpas, "Generalized planning with deep reinforcement learning," arXiv preprint arXiv: 2005.02305, 2020.
  9. S. Das, S. Natarajan, K. Roy, R. Parr, and K. Kersting, "Fitted Q-learning for relational domains," arXiv preprint arXiv: 2006.05595, 2020.
  10. D. Adjodah, T. Klinger, and J. Joseph, "Symbolic relation networks for reinforcement learning," Proceedings of the Workshop on Relational Representation Learning in Conference on Neural Information Processing Systems (NeurIPS), 2018.
  11. T. Gokhale, S. Sampat, Z. Fang, Y. Yang, and C. Baral, "Blocksworld revisited: Learning and reasoning to generate event-sequences from image pairs," arXiv preprint arXiv: 1905.12042, 2019.
  12. Y. Zhang and A. Ramesh, "Learning interpretable relational structures of hinge-loss markov random fields," Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp.6050-6056, Aug. 2019.
  13. Y. Zhang and A. Ramesh, "Learning fairness-aware relational structures," Proceedings of European Conference on Artificial Intelligence (ECAI), 2020.
  14. M. A. Skinner, L. Raman, N, Shah, A. Farhat, and S. Natarajan, "A preliminary approach for learning relational policies for the management of critically ill children," arXiv preprint arXiv:2001.04432, 2020.