DOI QR코드

DOI QR Code

Collision Avoidance Path Control of Multi-AGV Using Multi-Agent Reinforcement Learning

다중 에이전트 강화학습을 이용한 다중 AGV의 충돌 회피 경로 제어

  • 최호빈 (한국기술교육대학교 컴퓨터공학과 미래융합공학전공) ;
  • 김주봉 (한국기술교육대학교 컴퓨터공학과 미래융합공학전공) ;
  • 한연희 (한국기술교육대학교 컴퓨터공학과 미래융합공학전공) ;
  • 오세원 (한국전자통신연구원) ;
  • 김귀훈 (한국교원대학교 인공지능융합교육전공)
  • Received : 2022.04.15
  • Accepted : 2022.04.22
  • Published : 2022.09.30

Abstract

AGVs are often used in industrial applications to transport heavy materials around a large industrial building, such as factories or warehouses. In particular, in fulfillment centers their usefulness is maximized for automation. To increase productivity in warehouses such as fulfillment centers, sophisticated path planning of AGVs is required. We propose a scheme that can be applied to QMIX, a popular cooperative MARL algorithm. The performance was measured with three metrics in several fulfillment center layouts, and the results are presented through comparison with the performance of the existing QMIX. Additionally, we visualize the transport paths of trained AGVs for a visible analysis of the behavior patterns of the AGVs as heat maps.

산업 응용 분야에서 AGV는 공장이나 창고와 같은 대규모 산업 시설의 무거운 자재를 운송하기 위해 자주 사용된다. 특히, 주문처리 센터에서는 자동화가 가능하여 유용성이 극대화된다. 이러한 주문처리 센터와 같은 창고에서 생산성을 높이기 위해서는 AGV들의 정교한 운반 경로 제어가 요구된다. 본 논문에서는 대중적인 협력 MARL 알고리즘인 QMIX에 적용될 수 있는 구조를 제안한다. 성능은 두 종류의 주문처리 센터 레이아웃에서 세 가지의 메트릭으로 측정하였으며, 결과는 기존 QMIX의 성능과 비교하여 제시된다. 추가적으로, AGV들의 행동 패턴에 대한 가시적인 분석을 위해 훈련된 AGV들의 운반 경로를 시각화한 히트맵을 제공한다.

Keywords

Acknowledgement

이 논문은 2020년도 정부(교육부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임(No. NRF-2020R1I1A3065610). 또한, 이 논문은 2020년도 한국기술교육대학교 교수 교육연구진흥과제 지원에 의하여 연구되었음.

References

  1. L. Busoniu, R. Babuska, and B. Schutter, "Multi-agent reinforcement learning: An overview," Innovations in Multi-agent Systems and Applications-1, pp.183-221, 2010.
  2. J. Cui, Y. Liu, and A. Nallanathan, "Multi-agent reinforcement learning-based resource allocation for UAV networks," IEEE Transactions on Wireless Communications, Vol.19, No.2, pp.729-743, 2019. https://doi.org/10.1109/twc.2019.2935201
  3. X. Li, J. Zhang, J. Bian, Y. Tong, and T. Liu, "A cooperative multi-agent reinforcement learning framework for resource balancing in complex logistics network," In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019.
  4. X. Li, X. Hu, W. Li, and H. Hu, "A multi-agent reinforcement learning routing protocol for underwater optical sensor networks," In Proceedings of IEEE International Conference on Communications, 2019.
  5. F. A. Oliehoek, M. T. J. Spaan, and N. Vlassis, "Optimal and approximate Q-value functions for decentralized POMDPs," Journal of Artificial Intelligence Research, Vol.32, pp.289-353, 2008. https://doi.org/10.1613/jair.2447
  6. F. A. Oliehoek and C. Amato, "A concise introduction to decentralized POMDPs," SpringerBriefs in Intelligent Systems, Springer, 2016.
  7. J. J. Enright and P. R. Wurman, "Optimization and coordinated autonomy in mobile fulfillment systems," In Proceedings of the AAAI Workshop on Automated Action Planning for Autonomous Mobile Robots, pp.33-38, 2011.
  8. J. Bae and W. Chung, "A heuristic for a heterogeneous automated guided vehicle routing problem," International Journal of Precision Engineering and Manufacturing, Vol.18, No.6, pp.795-801, 2017. https://doi.org/10.1007/s12541-017-0095-3
  9. Z. Han, D. Wang, F. Liu, and Z. Zhao, "Multi-AGV path planning with double-path constraints by using an improved genetic algorithm," PloS one, Vol.12, No.7, 2017.
  10. Y. Lian and W. Xie, "Improved A* multi-AGV path planning algorithm based on grid-shaped network," In 2019 Chinese Control Conference, 2019.
  11. R. Kamoshida and Y. Kazama, "Acquisition of automated guided vehicle route planning policy using deep reinforcement learning," IEEE International Conference on Advanced Logistics and Transport (ICALT), 2017.
  12. Y. Yang, J. Li, and L. Peng, "Multi-robot path planning based on a deep reinforcement learning DQN algorithm," CAAI Transactions on Intelligence Technology, Vol.5, No.3, pp.177-183, 2020. https://doi.org/10.1049/trit.2020.0024
  13. C. J. C. H. Watkins and P. Dayan, "Q-learning," Machine Learning, Vol.8, pp.279-292, 1992. https://doi.org/10.1007/BF00992698
  14. V. Mnih et al., "Human-level control through deep reinforcement learning," Nature, Vol.518, No.7540, pp.529-533, 2015. https://doi.org/10.1038/nature14236
  15. M. Tan, "Multi-agent reinforcement learning: Independent vs. cooperative agents," In Proceedings of the Tenth International Conference on Machine Learning, pp.330-337, 1993.
  16. P. Sunehag et al., "Value-decomposition networks for co-operative multi-agent learning based on team reward," In Proceedings of 17th International Conference on Autonomous Agents and Multiagent Systems, Stockholm, Sweden, 2018.
  17. T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. Foerster, and S. Whiteson, "QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning," In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 2018.
  18. O. Vinyals et al., "Starcraft II: A new challenge for reinforcement learning," arXiv preprint arXiv:1708.04782, 2017.
  19. D. Ye et al., "Mastering complex control in moba games with deep reinforcement learning," In Proceedings of the AAAI Conference on Artificial Intelligence, pp.6672-6679, 2020.
  20. S. Huang and S. Ontanon, "A closer look at invalid action masking in policy gradient algorithms," arXiv preprint arXiv:2006.14171, 2020. https://doi.org/10.32473/flairs.v35i.130584
  21. J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," In NIPS 2014 Workshop on Deep Learning, 2014.
  22. D. Ha, A. Dai, and Q. V. Le, "Hypernetworks," In Proceedings of the International Conference on Learning Representations (ICLR), 2017.