DOI QR코드

DOI QR Code

Stochastic Initial States Randomization Method for Robust Knowledge Transfer in Multi-Agent Reinforcement Learning

멀티에이전트 강화학습에서 견고한 지식 전이를 위한 확률적 초기 상태 랜덤화 기법 연구

  • Dohyun Kim (Defense AI Center, Agency for Defense Development) ;
  • Jungho Bae (Defense AI Center, Agency for Defense Development)
  • 김도현 (국방과학연구소 국방AI센터) ;
  • 배정호 (국방과학연구소 국방AI센터)
  • Received : 2024.05.09
  • Accepted : 2024.07.09
  • Published : 2024.08.05

Abstract

Reinforcement learning, which are also studied in the field of defense, face the problem of sample efficiency, which requires a large amount of data to train. Transfer learning has been introduced to address this problem, but its effectiveness is sometimes marginal because the model does not effectively leverage prior knowledge. In this study, we propose a stochastic initial state randomization(SISR) method to enable robust knowledge transfer that promote generalized and sufficient knowledge transfer. We developed a simulation environment involving a cooperative robot transportation task. Experimental results show that successful tasks are achieved when SISR is applied, while tasks fail when SISR is not applied. We also analyzed how the amount of state information collected by the agents changes with the application of SISR.

Keywords

Acknowledgement

이 논문은 2024년 정부의 재원으로 수행된 연구 결과임.

References

  1. I. Hwang and J. Bae, "Two Circle-based Aircraft Head-on Reinforcement Learning Technique using Curriculum," Journal of the Korea Institute of Military Science and Technology, Vol. 26, No. 4, pp. 352-360, 2023. 
  2. S. Yi, K. Kim, and S. Yoon, "Study on Enhancing Training Efficiency of MARL for Swarm Using Transfer Learning," Journal of the Korea Institute of Military Science and Technology, Vol. 26, No. 4, pp. 361-370, 2023. 
  3. M. Samvelyan, T. Rashid, C. S. De Witt, G. Farquhar, N. Nardelli, T. GJ Hung, C.-M. Hung, P. HS Torr, J. Foerster, and S. Whiteson, "The StarCraft Multi-Agent Challenge," arXiv preprint arXiv:1902.04043, 2019. 
  4. R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments," Advances in Neural Information Processing Systems, pp. 6379-6390, 2017. 
  5. K. Kurach, A. Raichuk, P. Stanczyk, M. Zajac, O. Bachem, L. Espeholt, C. Riquelme, D. Vincent, M. Michalski, O. Bousquet, and S. Gelly, "Google Research Football: A Novel Reinforcement Learning Environment," Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, No. 4, pp. 4501-4510, 2020. 
  6. P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, "Value-Decomposition Networks For Cooperative Multi-Agent Learning," arXiv preprint arXiv:1706.05296, 2017. 
  7. T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, "Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning," Journal of Machine Learning Research, Vol. 21, No. 178, pp. 1-51, 2020. 
  8. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with Deep Reinforcement Learning," arXiv preprint arXiv:1312.5602, 2013. 
  9. Z. Zhu, K. Lin, A. K. Jain, and J. Zhou, "Transfer Learning in Deep Reinforcement Learning: A Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, No. 11, pp. 13344-13362, 2023. 
  10. J. Ho and S. Ermon, "Generative Adversarial Imitation Learning," Advances in Neural Information Processing Systems, pp. 4565-4573, 2016. 
  11. M. Vecerik, T. Hester, J. Scholz, F. Wang, O. Pietquin, B. Piot, N. Heess, T. Rothorl, T. Lampe, and M. Riedmiller, "Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards," arXiv preprint arXiv:1707.08817, 2018. 
  12. G. Hinton, O. Vinyals, and J. Dean, "Distilling the Knowledge in a Neural Network," arXiv preprint arXiv:1503.02531, 2015. 
  13. F. Fernandez and M. Veloso, "Probabilistic policy reuse in a reinforcement learning agent," Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 720-727, 2006. 
  14. A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, "Progressive Neural Networks," arXiv preprint arXiv:1606.04671, 2022. 
  15. C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A. Rusu, A. Pritzel, and D. Wierstra, "PathNet: Evolution Channels Gradient Descent in Super Neural Networks," arXiv preprint arXiv:1701.08734, 2017. 
  16. W. Wang, T. Yang, Y. Liu, J. Hao, X. Hao, Y. Hu, Y. Chen, C. Fan, and Y. Gao, "From Few to More: Large-Scale Dynamic Multiagent Curriculum Learning," Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, No. 5, pp. 7293-7300, 2020. 
  17. S. Hu, F. Zhu, X. Chang, and X. Liang, "Updet: Universal Multi-Agent Reinforcement Learning via Policy Decoupling with Transformers," arXiv preprint arXiv:2101.08001, 2021. 
  18. T. Zhou, F. Zhang, K. Shao, Z. Dai, K. Li, W. Huang, W. Wang, B. Wang, D. Li, W. Liu, and others, "Cooperative Multi-Agent Transfer Learning with Coalition Pattern Decomposition," IEEE Transactions on Games, Vol. 16, No. 2, pp. 352-364, 2024. 
  19. W. Zeng, J. Campbell, S. Stepputtis, and K. Sycara, "Multi-Agent Transfer Learning via Temporal Contrastive Learning," arXiv preprint arXiv:2406.01377, 2024. 
  20. Z. Zhu, K. Lin, A. K. Jain, and J. Zhou, "Transfer Learning in Deep Reinforcement Learning: A Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, No. 11, pp. 13344-13362, 2023.