Browse > Article
http://dx.doi.org/10.3745/KTSDE.2021.10.4.143

C-COMA: A Continual Reinforcement Learning Model for Dynamic Multiagent Environments  

Jung, Kyueyeol (경기대학교 컴퓨터과학과)
Kim, Incheol (경기대학교 컴퓨터과학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.10, no.4, 2021 , pp. 143-152 More about this Journal
Abstract
It is very important to learn behavioral policies that allow multiple agents to work together organically for common goals in various real-world applications. In this multi-agent reinforcement learning (MARL) environment, most existing studies have adopted centralized training with decentralized execution (CTDE) methods as in effect standard frameworks. However, this multi-agent reinforcement learning method is difficult to effectively cope with in a dynamic environment in which new environmental changes that are not experienced during training time may constantly occur in real life situations. In order to effectively cope with this dynamic environment, this paper proposes a novel multi-agent reinforcement learning system, C-COMA. C-COMA is a continual learning model that assumes actual situations from the beginning and continuously learns the cooperative behavior policies of agents without dividing the training time and execution time of the agents separately. In this paper, we demonstrate the effectiveness and excellence of the proposed model C-COMA by implementing a dynamic mini-game based on Starcraft II, a representative real-time strategy game, and conducting various experiments using this environment.
Keywords
Multiagent Reinforcement Learning; Dynamic Environment; Continual Learning; Starcraft II;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. Samvelyan, T. Rashid, C. S. Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C. M. Hung, P. H. S. Torr, J. N. Foerster, and S. Whiteson, "The StarCraft Multi-Agent Challenge," CoRR, abs/1902.04043, 2019.
2 J. N. Foerster, G, Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, "Counterfactual multi-agent policy gradients," in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
3 P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, "Value-decomposition networks for cooperative multi-agent learning based on team reward," in Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2017.
4 M. Tan, "Multi-agent reinforcement learning: Independent vs. cooperative agents." in Proceedings of the Tenth International Conference on Machine Learning (ICML), pp.330-337, 1993.
5 C. Watkins, "Learning from delayed rewards," Ph.D. Thesis, University of Cambridge England, 1989.
6 V. Mnih, et al., "Human-level control through deep reinforcement learning," Nature, pp.529-533, 2015.
7 A. Tampuu, et al., "Multiagent cooperation and competition with deep reinforcement learning," PLoS ONE, Vol.12, No.4, 2017.
8 J. N. Foerster, et al., "Stabilising experience replay for deep multi-agent reinforcement learning," in Proceedings of The 34th International Conference on Machine Learning (ICML), pp.1146-1155, 2017
9 C. Guestrin, D. Koller, and R. Parr, "Multiagent planning with factored MDPs," In Advances in Neural Information Processing Systems (NIPS), MIT Press, pp.1523-1530, 2002.
10 S. Sukhbaatar, R. Fergus, A. Szlam, and R. Fergus, "Learning multiagent communication with backpropagation," In Advances in Neural Information Processing Systems (NIPS), pp.2244-2252, 2016.
11 P. Peng, et al., "Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play StarCraft combat games," In Advances in Neural Information Processing Systems (NIPS), 2017.
12 R. Lowe, Y. Wu, A. Tamar, J. Harb, O. P. Abbeel, and I. Mordatch, "Multi-agent actor-critic for mixed cooperative-competitive environments," In Advances in Neural Information Processing Systems (NIPS), pp.6382-6393, 2017.
13 S. Iqbal, C. A, C. S. Witt, B. Penget, W. Bohmer, S. Whiteson, and F. Sha, "AI-QMIX: Attention and imagination for dynamic multi-agent reinforcement learning," arXiv: 2006.04222, 2020.
14 J. R. Kok and N. Vlassis, "Collaborative multiagent reinforcement learning by payoff propagation," Journal of Machine Learning Research, pp.1789-1828, 2006.
15 T. Rashid, M. Samvelyan, C. S. Witt, G. Farquhar, J. N. Foerster, and S. Whiteson, "Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning," in Proceedings of the International Conference on Machine Learning (ICML), pp.4292-4301, 2018.
16 J. K. Gupta, M. Egorov, and M. Kochenderfer, "Cooperative multi-agent control using deep reinforcement learning," in Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Springer, pp.66-83, 2017.