DOI QR코드

DOI QR Code

C-COMA: A Continual Reinforcement Learning Model for Dynamic Multiagent Environments

C-COMA: 동적 다중 에이전트 환경을 위한 지속적인 강화 학습 모델

  • Received : 2020.12.14
  • Accepted : 2020.12.25
  • Published : 2021.04.30

Abstract

It is very important to learn behavioral policies that allow multiple agents to work together organically for common goals in various real-world applications. In this multi-agent reinforcement learning (MARL) environment, most existing studies have adopted centralized training with decentralized execution (CTDE) methods as in effect standard frameworks. However, this multi-agent reinforcement learning method is difficult to effectively cope with in a dynamic environment in which new environmental changes that are not experienced during training time may constantly occur in real life situations. In order to effectively cope with this dynamic environment, this paper proposes a novel multi-agent reinforcement learning system, C-COMA. C-COMA is a continual learning model that assumes actual situations from the beginning and continuously learns the cooperative behavior policies of agents without dividing the training time and execution time of the agents separately. In this paper, we demonstrate the effectiveness and excellence of the proposed model C-COMA by implementing a dynamic mini-game based on Starcraft II, a representative real-time strategy game, and conducting various experiments using this environment.

다양한 실세계 응용 분야들에서 공동의 목표를 위해 여러 에이전트들이 상호 유기적으로 협력할 수 있는 행동 정책을 배우는 것은 매우 중요하다. 이러한 다중 에이전트 강화 학습(MARL) 환경에서 기존의 연구들은 대부분 중앙-집중형 훈련과 분산형 실행(CTDE) 방식을 사실상 표준 프레임워크로 채택해왔다. 하지만 이러한 다중 에이전트 강화 학습 방식은 훈련 시간 동안에는 경험하지 못한 새로운 환경 변화가 실전 상황에서 끊임없이 발생할 수 있는 동적 환경에서는 효과적으로 대처하기 어렵다. 이러한 동적 환경에 효과적으로 대응하기 위해, 본 논문에서는 새로운 다중 에이전트 강화 학습 체계인 C-COMA를 제안한다. C-COMA는 에이전트들의 훈련 시간과 실행 시간을 따로 나누지 않고, 처음부터 실전 상황을 가정하고 지속적으로 에이전트들의 협력적 행동 정책을 학습해나가는 지속 학습 모델이다. 본 논문에서는 대표적인 실시간 전략게임인 StarcraftII를 토대로 동적 미니게임을 구현하고 이 환경을 이용한 다양한 실험들을 수행함으로써, 제안 모델인 C-COMA의 효과와 우수성을 입증한다.

Keywords

Acknowledgement

정보통신기획평가원/정보통신방송 기술개발사업/클라우드에 연결된 개별 로봇 및 로봇그룹의 작업 계획 기술 개발 / 2020-0-00096.

References

  1. M. Samvelyan, T. Rashid, C. S. Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C. M. Hung, P. H. S. Torr, J. N. Foerster, and S. Whiteson, "The StarCraft Multi-Agent Challenge," CoRR, abs/1902.04043, 2019.
  2. J. N. Foerster, G, Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, "Counterfactual multi-agent policy gradients," in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  3. P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, "Value-decomposition networks for cooperative multi-agent learning based on team reward," in Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2017.
  4. T. Rashid, M. Samvelyan, C. S. Witt, G. Farquhar, J. N. Foerster, and S. Whiteson, "Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning," in Proceedings of the International Conference on Machine Learning (ICML), pp.4292-4301, 2018.
  5. M. Tan, "Multi-agent reinforcement learning: Independent vs. cooperative agents." in Proceedings of the Tenth International Conference on Machine Learning (ICML), pp.330-337, 1993.
  6. C. Watkins, "Learning from delayed rewards," Ph.D. Thesis, University of Cambridge England, 1989.
  7. V. Mnih, et al., "Human-level control through deep reinforcement learning," Nature, pp.529-533, 2015.
  8. A. Tampuu, et al., "Multiagent cooperation and competition with deep reinforcement learning," PLoS ONE, Vol.12, No.4, 2017.
  9. J. N. Foerster, et al., "Stabilising experience replay for deep multi-agent reinforcement learning," in Proceedings of The 34th International Conference on Machine Learning (ICML), pp.1146-1155, 2017
  10. C. Guestrin, D. Koller, and R. Parr, "Multiagent planning with factored MDPs," In Advances in Neural Information Processing Systems (NIPS), MIT Press, pp.1523-1530, 2002.
  11. J. R. Kok and N. Vlassis, "Collaborative multiagent reinforcement learning by payoff propagation," Journal of Machine Learning Research, pp.1789-1828, 2006.
  12. S. Sukhbaatar, R. Fergus, A. Szlam, and R. Fergus, "Learning multiagent communication with backpropagation," In Advances in Neural Information Processing Systems (NIPS), pp.2244-2252, 2016.
  13. P. Peng, et al., "Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play StarCraft combat games," In Advances in Neural Information Processing Systems (NIPS), 2017.
  14. J. K. Gupta, M. Egorov, and M. Kochenderfer, "Cooperative multi-agent control using deep reinforcement learning," in Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Springer, pp.66-83, 2017.
  15. R. Lowe, Y. Wu, A. Tamar, J. Harb, O. P. Abbeel, and I. Mordatch, "Multi-agent actor-critic for mixed cooperative-competitive environments," In Advances in Neural Information Processing Systems (NIPS), pp.6382-6393, 2017.
  16. S. Iqbal, C. A, C. S. Witt, B. Penget, W. Bohmer, S. Whiteson, and F. Sha, "AI-QMIX: Attention and imagination for dynamic multi-agent reinforcement learning," arXiv: 2006.04222, 2020.