A Survey on Recent Advances in Multi-Agent Reinforcement Learning

Yoo, B.H.;Ningombam, D.D.;Kim, H.W.;Song, H.J.;Park, G.M.;Yi, S.;

doi:10.22648/ETRI.2020.J.350614

Electronics and Telecommunications Trends (전자통신동향분석)

Volume 35 Issue 6
/
Pages.137-149
/
2020
/
1225-6455(pISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

A Survey on Recent Advances in Multi-Agent Reinforcement Learning

멀티 에이전트 강화학습 기술 동향

유병현 (복합지능연구실) ;
데브라니 데비 (정보전략부) ;
김현우 (복합지능연구실) ;
송화전 (복합지능연구실) ;
박경문 (복합지능연구실) ;
이성원 (정보전략부)

Published : 2020.12.01

https://doi.org/10.22648/ETRI.2020.J.350614 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Several multi-agent reinforcement learning (MARL) algorithms have achieved overwhelming results in recent years. They have demonstrated their potential in solving complex problems in the field of real-time strategy online games, robotics, and autonomous vehicles. However these algorithms face many challenges when dealing with massive problem spaces in sparse reward environments. Based on the centralized training and decentralized execution (CTDE) architecture, the MARL algorithms discussed in the literature aim to solve the current challenges by formulating novel concepts of inter-agent modeling, credit assignment, multiagent communication, and the exploration-exploitation dilemma. The fundamental objective of this paper is to deliver a comprehensive survey of existing MARL algorithms based on the problem statements rather than on the technologies. We also discuss several experimental frameworks to provide insight into the use of these algorithms and to motivate some promising directions for future research.

Keywords

MARL

Acknowledgement

본 연구는 한국전자통신연구원 연구운영지원사업의 일환으로 수행되었음[20ZS1100, 자율성장형 복합인공지능 원천기술 연구, 19YE1400, 멀티 에이전트 환경에서 인간-에이전트 협업기술 선행연구 및 개발환경 구축].

References

V. Mnih et al., "Playing atari with deep reinforcement learning," arXiv preprint, CoRR, 2013, arXiv: 1312.5602.
J. Schulman et al., "Trust Region Policy Optimization," in Proc. Int Conf. Mach. Learn. (Lille, France), Feb. 2015, pp. 1889-1897.
J. Schulman et al., "Proximal policy optimization algorithms," arXiv preprint, CoRR, 2017, arXiv: 1707.06347.
T. P. Lillicrap et al., "Continuous control with deep reinforcement learning," in Int. Conf. Learn. Representations, 2016.
K. Zhang, Z. Yang, and T. Basar, "Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms," arXiv preprint, CoRR, 2019, arXiv: 1911.10635v1.
O. Jadid and D. Hajinezhad, "A review of cooperative multiagent deep reinforcement learning," arXiv preprint, CoRR, 2019, arXiv: 1908.03963v3.
R. Lowe et al., "Multi-agent actor-critic for mixed cooperativecompetitive environments," in Advances in Neural Information Processing Systems, 2017, pp. 6379-6390.
Y. Yang et al., "Mean field multi-agent reinforcement learning," in Proc. Int. conf. Mach. Learn. (Stockholm, Sweden), 2018.
S. Iqbal and F. Sha, "Actor-attention-critic for multi-agent reinforcement learning," in Proc. Int. Conf. Mach. Lear. (Long Beach, CA, USA), 2019, pp. 2961-2970.
T. Haarnoja et al., "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor," in Prco. Int. Conf. Mach. Learn. (Stockholm, Sweden), 2018, pp. 1861-1870.
H. Ryu, H. Shin, and J. Park, "Multi-agent actor-critic with hierarchical graph attention network," in Proc. AAAI Conf. Artif. Intell. (New York, USA), 2020, pp. 7236-7243.
J. Foerster et al., "Counterfactual multi-agent policy gradients," in Proc. AAAI Conf. Artif. Intell. 2020.
P. Sunehag et al., "Value-decomposition networks for cooperative multi-agent learning based on team reward," in Proc. Int. Conf. Auto. Agent. Multi. Syst. 2018, pp. 2085-2087.
T. Rashid et al., "QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning," in Proc. Int. Conf. Mach. Learn. 2018.
K. Son et al., "Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning," in Proc. Int. Conf. Mach. Learn. 2019.
Y. Du et al., "LIIR: Learning Individual Intrinsic Reward in MultiAgent Reinforcement Learning," in Proc. Adv. Neural Inform. Process. Syst. 2019, pp. 4403-4414.
C. V. Goldman and S. Zilberstein, "Decentralized control of cooperative systems: Categorization and complexity analysis," J. Artif. Intelli. Res. vol. 22, 2004, pp. 143-174. https://doi.org/10.1613/jair.1427
E. Pesce and G. Montana, "Improving coordination in smallscale multi-agent deep reinforcement learning through memory-driven communication," Mach. Learn. vol. 109, 2020, doi: 10.1007/s10994-019-05864-5.
S. Q. Zhang, Q. Zhang, and J. Lin, "Efficient communication in multi-agent reinforcement learning via variance based control," in Adv. Neural Inform. Process. Syst. 2019, pp. 3235-3244.
H. Mao et al., "Learning agent communication under limited bandwidth by message rruning," arXiv preprint, CoRR, Dec. 2019, Accessed: Sep. 21, 2020. [Online]. Available: http://arxiv.org/abs/1912.05304.
D. Kim et al., "Learning to schedule communication in multiagent reinforcement learning," arXiv preprint, CoRR, Feb. 2019, Accessed: Sep. 10, 2020. [Online]. Available: http://arxiv.org/abs/1902.01554.
J. Foerster et al., "Learning to communicate with deep multiagent reinforcement learning," in Adv. Neural Inform. Process. Syst. 2016, pp. 2137-2145.
N. Jaques et al., "Social influence as intrinsic motivation for multi-agent deep reinforcement learning," in Proc. Int. Conf. Mach. Learn. 2019, pp. 3040-3049.
K. Cao et al., "Emergent communication through negotiation," arXiv preprint, CoRR, Apr. 2018, Accessed: Sep. 09, 2020. [Online]. Available: http://arxiv.org/abs/1804.03980.
T. Eccles et al., "Biases for emergent communication in multiagent reinforcement learning," in Adv. Neural Inform. Process. Syst. 2019, pp. 13111-13121.
S. Gupta, R. Hazra, and A. Dukkipati, "Networked multi-agent reinforcement learning with emergent communication," In Proc. Int. Conf. Auton. Agents and Multiagent Syst. (Auckland, New Zealand), May 2020.
T. Wang et al., "Influence-based multi-agent exploration," in Proc. Int. Conf. Learn. Representations, 2020.
G. Chen, "A new framework for multi-agent reinforcement learning-centralized training and exploration with decentralized execution via policy distillation," in Proc. Int. Conf. Auton. Agents Multiagent Sys. 2019.
A. Mahajan et al., "Maven: Multi-agent variational exploration," in Adv. Neural Inform. Process. Syst. 2019, pp. 7613-7624.
G. Brockman et al., "Openai gym," arXiv preprint, CoRR, arXiv: 1606.01540.
M. Samvelyan et al., "The starcraft multi-agent challenge," arXiv preprint, CoRR, 2019, arXiv: 1902.04043.

Electronics and Telecommunications Trends (전자통신동향분석)

A Survey on Recent Advances in Multi-Agent Reinforcement Learning

멀티 에이전트 강화학습 기술 동향

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)