DOI QR코드

DOI QR Code

Collective Navigation Through a Narrow Gap for a Swarm of UAVs Using Curriculum-Based Deep Reinforcement Learning

커리큘럼 기반 심층 강화학습을 이용한 좁은 틈을 통과하는 무인기 군집 내비게이션

  • Received : 2023.10.31
  • Accepted : 2023.11.22
  • Published : 2024.02.29

Abstract

This paper introduces collective navigation through a narrow gap using a curriculum-based deep reinforcement learning algorithm for a swarm of unmanned aerial vehicles (UAVs). Collective navigation in complex environments is essential for various applications such as search and rescue, environment monitoring and military tasks operations. Conventional methods, which are easily interpretable from an engineering perspective, divide the navigation tasks into mapping, planning, and control; however, they struggle with increased latency and unmodeled environmental factors. Recently, learning-based methods have addressed these problems by employing the end-to-end framework with neural networks. Nonetheless, most existing learning-based approaches face challenges in complex scenarios particularly for navigating through a narrow gap or when a leader or informed UAV is unavailable. Our approach uses the information of a certain number of nearest neighboring UAVs and incorporates a task-specific curriculum to reduce learning time and train a robust model. The effectiveness of the proposed algorithm is verified through an ablation study and quantitative metrics. Simulation results demonstrate that our approach outperforms existing methods.

Keywords

Acknowledgement

This research was supported by Defense Acquisition Program Administration (DAPA) and Agency for Defense Development (ADD) (UG223047VD)

References

  1. A. Kurt, N. Saputro, K. Akkaya, and A. S. Uluagac, "Distributed Connectivity Maintenance in Swarm of Drones During Post-Disaster Transportation Applications," IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 9, pp. 6061-6073, Sept., 2021, DOI: 10.1109/TITS.2021.3066843. 
  2. X. Li, J. Zhang, and J. Han, "Trajectory Planning of Load Transportation with Multi-Quadrotors Based on Reinforcement Learning Algorithm," Aerospace Science and Technology, vol. 116, pp. 106887, Sept., 2021, DOI: 10.1016/j.ast.2021.106887. 
  3. W. Yao, Y. Chen, J. Fu, D. Qu, C. Wu, J. Liu, G. Sun, and L. Xin, "Evolutionary Utility Prediction Matrix-Based Mission Planning for Unmanned Aerial Vehicles in Complex Urban Environments," IEEE Transactions on Intelligent Vehicles, vol. 8, no. 2, pp. 1068-1080, Feb., 2022, DOI: 10.1109/TIV.2022.3192525. 
  4. D. Shin, H. Moon, S. Kang, S. Lee, H. Yang, C. Park, M. Nam, K. Jung, and Y. Kim, "Implementation of MAPF-Based Fleet Management System," The Journal of Korea Robotics Society, vol. 17, no. 4, pp. 407-416, Nov., 2022, DOI: 10.7746/jkros.2022.17.4.407. 
  5. J. Lee, "Improved Heterogeneous-Ants-Based Path Planner Using RRT*," The Journal of Korea Robotics Society, vol. 14, no. 4, pp. 285-292, Nov., 2019, DOI: 10.7746/jkros.2019.14.4.285. 
  6. A. E. Turgut, H. Celikkanat, F. Gokce, and E. Sahin, "Self-Organized Flocking in Mobile Robot Swarms," Swarm Intelligence, vol. 2, no. 2, pp. 97-120, Aug., 2008, DOI: 10.1007/s11721-008-0016-2. 
  7. T. Vicsek and A. Zafeiris, "Collective Motion," Physics Reports, vol. 517, no. 3-4, pp. 71-140, Aug., 2012, DOI: 10.1016/j.physrep.2012.03.004. 
  8. F. Wang, J. Huang, K. H. Low, and T. Hu, "Collective Navigation of Aerial Vehicle Swarms: A Flocking Inspired Approach," IEEE Transactions on Intelligent Vehicles, May, pp. 1-14, 2023, DOI: 10.1109/TIV.2023.3271667. 
  9. B. Volkl and J. Fritz, "Relation Between Travel Strategy and Social Organization of Migrating Birds with Special Consideration of Formation Flight in the Northern Bald Ibis," Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 372, no. 1727, Aug., 2017, DOI: 10.1098/rstb.2016.0235. 
  10. C. Okasaki, M. L. Keefer, P. A. Westley, and A. M. Berdahl, "Collective Navigation Can Facilitate Passage Through Human-Made Barriers by Homeward Migrating Pacific Salmon," The Royal Society B, vol. 287, no. 1937, Oct., 2020, DOI: 10.1098/rspb.2020.2137. 
  11. S. T. Johnston and K. J. Painter, "Modelling Collective Navigation via Nonlocal Communication," Journal of the Royal Society Interface, vol. 18, no. 182, Sept., 2021, DOI: 10.1098/rsif.2021.0383. 
  12. M. Dorigo, G. Theraulaz, and V. Trianni, "Reflections on the Future of Swarm Robotics," Science Robotics, vol. 5, no. 49, Dec., 2020, DOI: 10.1126/scirobotics.abe4385. 
  13. X. Zhou, X. Wen, Z. Wang, Y. Gao, H. Li, Q. Wang, T. Yang, H. Lu, Y. Cao, C. Xu, and F. Gao, "Swarm of Micro Flying Robots in the Wild," Science Robotics, vol. 7, no. 66, May, 2022, DOI: 10.1126/scirobotics.abm5954. 
  14. D. Mellinger and V. Kumar, "Minimum Snap Trajectory Generation and Control for Quadrotors," 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, pp. 2520-2525, 2011, DOI: 10.1109/ICRA.2011.5980409. 
  15. E. Soria, F. Schiano, and D. Floreano, "Predictive Control of Aerial Swarms in Cluttered Environments," Nature Machine Intelligence, vol. 3, no. 6, pp. 545-554, May, 2021, DOI: 10.1038/s42256-021-00341-y. 
  16. A. Loquercio, E. Kaufmann, R. Ranftl, M. Muller, V. Koltun, and D. Scaramuzza, "Learning High-Speed Flight in the Wild," Science Robotics, vol. 6, no. 59, Oct., 2021, DOI: 10.1126/scirobotics.abg5810. 
  17. A. Singla, S. Padakandla, and S. Bhatnagar, "Memory-Based Deep Reinforcement Learning for Obstacle Avoidance in UAV With Limited Environment Knowledge," IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 1, pp. 107-118, Jan., 2021, DOI: 10.1109/TITS.2019.2954952. 
  18. M. Kim, J. Kim, M. Jung, and H. Oh, "Towards Monocular Vision-Based Autonomous Flight Through Deep Reinforcement Learning," Expert Systems with Applications, vol. 198, Jul., 2022, DOI: 10.1016/j.eswa.2022.116742. 
  19. P. Yan, C. Bai, H. Zheng, and J. Guo, "Flocking Control of UAV Swarms with Deep Reinforcement Learning Approach," 2020 3rd International Conference on Unmanned Systems (ICUS), Harbin, China, pp. 592-599, 2020, DOI: 10.1109/ICUS50048.2020.9274899. 
  20. P. Zhu, W. Dai, W. Yao, J. Ma, Z. Zeng, and H. Lu, "Multi-Robot Flocking Control Based on Deep Reinforcement Learning," IEEE Access, vol. 8, pp. 150397-150406, Aug., 2020, DOI: 10.1109/ACCESS.2020.3016951. 
  21. W. Wang, L. Wang, J. Wu, X. Tao, and H. Wu, "Oracle-Guided Deep Reinforcement Learning for Large-Scale Multi-UAVs Flocking and Navigation," IEEE Transactions on Vehicular Technology, vol. 71, no. 10, pp. 10280-10292, Oct., 2022, DOI: 10.1109/TVT.2022.3184043. 
  22. C. Yan, C. Wang, X. Xiang, K. H. Low, X. Wang, X. Xu, and L. Shen, "Collision-Avoiding Flocking with Multiple Fixed-Wing UAVs in Obstacle-Cluttered Environments: A Task-Specific Curriculum-Based MADRL Approach," IEEE Transactions on Neural Networks and Learning Systems, pp. 1-15, Feb., 2023, DOI: 10.1109/TNNLS.2023.3245124. 
  23. Y. Bengio, J. Louradour, R. Collobert, and J. Weston, "Curriculum Learning," The 26th Annual International Conference on Machine Learning, pp. 41-48, 2009, DOI: 10.1145/1553374.1553380. 
  24. Collective Navigation Through a Narrow Gap for a Swarm of UAVs Using Curriculum-Based Deep RL, [Online], https://youtu.be/s0DFgJB6ODw, Accessed: Jan. 13, 2024. 
  25. K. Morihiro, T. Isokawa, H. Nishimura, and N. Matsui, "Emergence of Flocking Behavior Based on Reinforcement Learning," International conference on knowledge-based and intelligent information and engineering systems, pp. 699-706, 2006, DOI: 10.1007/11893011_89. 
  26. C. W. Reynolds, "Flocks, Herds and Schools: A Distributed Behavioral Model," The 14th Annual Conference on Computer Graphics and Interactive Techniques, pp. 25-34, 1987, DOI: 10.1145/37401.37406. 
  27. J. Panerati, H. Zheng, S. Zhou, J. Xu, A. Prorok, and A. P. Schoellig, "Learning to Fly: A Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-Agent Quadcopter Control," 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, Prague, Czech Republic, pp. 7512-7519, 2021, DOI: 10.1109/IROS51168.2021.9635857. 
  28. E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, K. Goldberg, J. Gonzalez, M. Jordan, and I. Stoica, "RLlib: Abstractions for Distributed Reinforcement Learning," arXiv:1712.09381, pp. 3053-3062, 2018, DOI: 10.48550/arXiv.1712.09381. 
  29. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal Policy Optimization Algorithms," arXiv:1707.06347, 2017, DOI: 10.48550/arXiv.1707.06347. 
  30. M. Ballerini, N. Cabibbo, R. Candelier, A. Cavagna, E. Cisbani, I. Giardina, V. Lecomte, A. Orlandi, G. Parisi, A. Procaccini, and M.Viale, "Interaction Ruling Animal Collective Behavior Depends on Topological Rather Than Metric Distance: Evidence from a Field Study," National Academy of Sciences, vol. 105, no. 4, pp. 1232-1237, Jan., 2008, DOI: 10.1073/pnas.0711437105. 
  31. S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, Nov., 1997, DOI: 10.1162/neco.1997.9.8.1735. 
  32. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is All You Need," arXiv:1706.03762, 2017, DOI: 10.48550/arXiv.1706.03762.