DOI QR코드

DOI QR Code

An autonomous radiation source detection policy based on deep reinforcement learning with generalized ability in unknown environments

  • Hao Hu (Sino-French Institute of Nuclear Engineering and Technology, Sun Yat-Sen University) ;
  • Jiayue Wang (Guangdong Environmental Radiation Monitoring Center) ;
  • Ai Chen (Guangdong Environmental Radiation Monitoring Center) ;
  • Yang Liu (Sino-French Institute of Nuclear Engineering and Technology, Sun Yat-Sen University)
  • 투고 : 2022.06.20
  • 심사 : 2022.09.09
  • 발행 : 2023.01.25

초록

Autonomous radiation source detection has long been studied for radiation emergencies. Compared to conventional data-driven or path planning methods, deep reinforcement learning shows a strong capacity in source detection while still lacking the generalized ability to the geometry in unknown environments. In this work, the detection task is decomposed into two subtasks: exploration and localization. A hierarchical control policy (HC) is proposed to perform the subtasks at different stages. The low-level controller learns how to execute the individual subtasks by deep reinforcement learning, and the high-level controller determines which subtasks should be executed at the current stage. In experimental tests under different geometrical conditions, HC achieves the best performance among the autonomous decision policies. The robustness and generalized ability of the hierarchy have been demonstrated.

키워드

참고문헌

  1. D.S. Hochbaum, B. Fishbain, Nuclear threat detection with mobile distributed sensor networks, Ann. Oper. Res. 187 (2011) 45-63, https://doi.org/10.1007/s10479-009-0643-z.
  2. R. Vilim, R.T. Klann, RadTarc: a system for detecting, localizing, and tracking radioactive sources in real time, Nucl. Technol. 168 (2009) 61-73, https://doi.org/10.13182/NT168-61.
  3. P. Kump, E.W. Bai, K.S. Chan, W. Eichinger, Detection of shielded radionuclides from weak and poorly resolved spectra using group positive RIVAL, Radiat. Meas. 48 (2013) 18-28, https://doi.org/10.1016/j.radmeas.2012.11.002.
  4. D. Connor, P.G. Martin, T.B. Scott, Airborne radiation mapping: overview and application of current and future aerial systems, Int. J. Remote Sens. 37 (2016) 5953-5987, https://doi.org/10.1080/01431161.2016.1252474.
  5. I. Tsitsimpelis, C.J. Taylor, B. Lennox, M.J. Joyce, A review of ground-based robotic systems for the characterization of nuclear environments, Prog. Nucl. Energy 111 (2019) 109-124, https://doi.org/10.1016/j.pnucene.2018.10.023.
  6. N.S.V. Rao, S. Sen, N.J. Prins, D.A. Cooper, R.J. Ledoux, J.B. Costales, K. Kamieniecki, S.E. Korbly, J.K. Thompson, J. Batcheler, R.R. Brooks, C.Q. Wu, Network algorithms for detection of radiation sources, Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip. 784 (2015) 326-331, https://doi.org/10.1016/j.nima.2015.01.037.
  7. E.W. Bai, K. Yosief, S. Dasgupta, R. Mudumbai, The maximum likelihood estimate for radiation source localization: initializing an iterative search, in: Proc. IEEE Conf. Decis. Control. 2015-Febru, 2014, pp. 277-282, https://doi.org/10.1109/CDC.2014.7039394.
  8. G. Cordone, R.R. Brooks, S. Sen, N.S.V. Rao, C.Q. Wu, M.L. Berry, K.M. Grieme, Improved multi-resolution method for MLE-based localization of radiation sources, in: 20th Int. Conf. Inf. Fusion, Fusion 2017 - Proc, 2017, https://doi.org/10.23919/ICIF.2017.8009626.
  9. M.R. Morelande, B. Ristic, Radiological source detection and localisation using Bayesian techniques, IEEE Trans. Signal. Process. 57 (2009) 4220-4231, https://doi.org/10.1109/TSP.2009.2026618.
  10. R.B. Andersonl, M. Pryor, S. Landsberger, Mobile robotic radiation surveying using recursive Bayesian estimation, IEEE Int. Conf. Autom. Sci. Eng. 2019-Augus (2019) 1187-1192, https://doi.org/10.1109/COASE.2019.8843064.
  11. P. Tandon, P. Huggins, R. Maclachlan, A. Dubrawski, K. Nelson, S. Labov, Detection of radioactive sources in urban scenes using Bayesian Aggregation of data from mobile spectrometers, Inf. Syst. 57 (2016) 195-206, https://doi.org/10.1016/j.is.2015.10.006.
  12. T. Lazna, P. Gabrlik, T. Jilek, L. Zalud, Cooperation between an unmanned aerial vehicle and an unmanned ground vehicle in highly accurate localization of gamma radiation hotspots, Int. J. Adv. Robot. Syst. 15 (2018) 1-16, https://doi.org/10.1177/1729881417750787.
  13. M. Hutchinson, H. Oh, W.H. Chen, Adaptive Bayesian sensor motion planning for Hazardous source term reconstruction, IFAC-PapersOnLine. 50 (2017) 2812-2817, https://doi.org/10.1016/j.ifacol.2017.08.632.
  14. R.A. Cortez, X. Papageorgiou, H.G. Tanner, A.V. Klimenko, K.N. Borozdin, W.C. Priedhorsky, Experimental implementation of robotic sequential nuclear search, 2007 Mediterr, Conf. Control. Autom. MED. (2007), https://doi.org/10.1109/MED.2007.4433797.
  15. M.K. Sharma, A.B. Alajo, H.K. Lee, Three-dimensional localization of low activity gamma-ray sources in real-time scenarios, Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip. 813 (2016) 132-138, https://doi.org/10.1016/j.nima.2016.01.001.
  16. R.S. Sutton, A.G. Barto, Reinforcement Learning: an Introduction, MIT press, 2018.
  17. D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis, Mastering the game of Go with deep neural networks and tree search, Nature 529 (2016) 484-489, https://doi.org/10.1038/nature16961.
  18. Z. Liu, S. Abbaszadeh, Double Q-Learning for radiation source detection, Sensors (Switzerland) 19 (2019), https://doi.org/10.3390/s19040960.
  19. G.R. Romanchek, S. Abbaszadeh, Stopping criteria for ending autonomous, single detector radiological source searches, PLoS One 16 (2021) 1-15, https://doi.org/10.1371/journal.pone.0253211.
  20. P. Proctor, C. Teuscher, A. Hecht, M. Osinski, Proximal policy optimization for radiation source search, J. Nucl. Eng. 2 (2021) 368-397, https://doi.org/10.3390/jne2040029.
  21. J. Berkson, Do radioactive decay events follow a random Poisson-Exponential? Int. J. Appl. Radiat. Isot. 26 (1975) 543-549. https://doi.org/10.1016/0020-708X(75)90093-9
  22. G.E. Monahan, State of the artda survey of partially observable Markov decision processes: theory, models, and algorithms, Manage. Sci. 28 (1982) 1-16. https://doi.org/10.1287/mnsc.28.1.1
  23. P. Cunningham, M. Cord, S.J. Delany, Supervised learning, in: Mach. Learn. Tech. Multimed., Springer, 2008, pp. 21-49.
  24. Y. Yu, X. Si, C. Hu, J. Zhang, A review of recurrent neural networks: LSTM cells and network architectures, Neural Comput. 31 (2019) 1235-1270. https://doi.org/10.1162/neco_a_01199
  25. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal Policy Optimization Algorithms, 2017, pp. 1-12. http://arxiv.org/abs/1707.06347.
  26. V. Konda, J. Tsitsiklis, Actor-critic algorithms, in: S. Solla, T. Leen, K. Muller (Eds.), Adv. Neural Inf. Process. Syst., MIT Press, 1999, in: https://proceedings.neurips.cc/paper/1999/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf.
  27. S. Amari, Backpropagation and stochastic gradient descent method, Neurocomputing 5 (1993) 185-196. https://doi.org/10.1016/0925-2312(93)90006-O
  28. N. Heess, T.B. Dhruva, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S.M. Ali Eslami, M. Riedmiller, D. Silver, Emergence of Locomotion Behaviours in Rich Environments, ArXiv, 2017.