DOI QR코드

DOI QR Code

Possibilities of reinforcement learning for nuclear power plants: Evidence on current applications and beyond

  • Aicheng Gong (State Key Laboratory of Nuclear Power Safety Monitoring Technology and Equipment, China Nuclear Power Engineering Company Ltd.) ;
  • Yangkun Chen (Shenzhen International Graduate School, Tsinghua University) ;
  • Junjie Zhang (Shenzhen International Graduate School, Tsinghua University) ;
  • Xiu Li (Shenzhen International Graduate School, Tsinghua University)
  • 투고 : 2023.08.08
  • 심사 : 2024.01.01
  • 발행 : 2024.06.25

초록

Nuclear energy plays a crucial role in energy supply in the 21st century, and more and more Nuclear Power Plants (NPPs) will be in operation to contribute to the development of human society. However, as a typical complex system engineering, the operation and development of NPPs require efficient and stable control methods to ensure the safety and efficiency of nuclear power generation. Reinforcement learning (RL) aims at learning optimal control policies via maximizing discounted long-term rewards. The reward-oriented learning paradigm has witnessed remarkable success in many complex systems, such as wind power systems, electric power systems, coal fire power plants, robotics, etc. In this work, we try to present a systematic review of the applications of RL on these complex systems, from which we believe NPPs can borrow experience and insights. We then conduct a block-by-block investigation on the application scenarios of specific tasks in NPPs and carried out algorithmic research for different situations such as power startup, collaborative control, and emergency handling. Moreover, we discuss the possibilities of further application of RL methods on NPPs and detail the challenges when applying RL methods on NPPs. We hope this work can boost the realization of intelligent NPPs, and contribute to more and more research on how to better integrate RL algorithms into NPPs.

키워드

과제정보

This work is supported by the SIT 2030-Key Project under Grant 2021ZD0201404.

참고문헌

  1. Vivianne H.M. Visschers, Carmen Keller, Michael Siegrist, Climate change benefits and energy supply benefits as determinants of acceptance of nuclear power stations: Investigating an explanatory model, Energy Policy 39 (6) (2011) 3621-3629. https://doi.org/10.1016/j.enpol.2011.03.064
  2. Sheng Zhou, Xiliang Zhang, Nuclear energy development in China: a study of opportunities and challenges, Energy 35 (11) (2010) 4282-4288. https://doi.org/10.1016/j.energy.2009.04.020
  3. Barry W Brook, Agustin Alonso, Daniel A Meneley, Jozef Misak, Tom Blees, Jan B van Erp, Why nuclear energy is sustainable and has to be part of the energy mix, Sustain. Mater. Technol. 1 (2014) 8-16.
  4. James H. Rust, Nuclear power plant engineering, 1979.
  5. Ronald Allen Knief, Nuclear energy technology: theory and practice of commercial nuclear power, 1981.
  6. Jose M. Arias, Manuel Lozano, An Advanced Course in Modern Nuclear Physics, Vol. 581, Springer, 2008.
  7. Mingrong Li Tingke Zhang, The Report On The Development of China's Nuclear Energy 2021, Technical Report, EG and G Idaho, Inc., Idaho Falls (USA), 2021.
  8. Steffen Schlomer, Thomas Bruckner, Lew Fulton, Edgar Hertwich, Alan McKinnon, Daniel Perczyk, Joyashree Roy, Roberto Schaeffer, Ralph Sims, Pete Smith, et al., Annex III: Technology-specific cost and performance parameters, in: Climate Change 2014: Mitigation of Climate Change: Contribution of Working Group III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, Cambridge University Press, 2014, pp. 1329-1356.
  9. Henry P. Birmingham, Franklin V. Taylor, A design philosophy for man-machine control systems, Proc. IRE 42 (12) (1954) 1748-1758. https://doi.org/10.1109/JRPROC.1954.274775
  10. D.T. McRuer, E.S. Krendel, The man-machine system concept, Proc. IRE 50 (5) (1962) 1117-1123. https://doi.org/10.1109/JRPROC.1962.288016
  11. Jean-Michel Hoc, From human-machine interaction to human-machine cooperation, Ergonomics 43 (7) (2000) 833-843. https://doi.org/10.1080/001401300409044
  12. Hang Wu, Weihua Su, Zhiguo Liu, PID controllers: Design and tuning methods, in: 2014 9th IEEE Conference on Industrial Electronics and Applications, IEEE, 2014, pp. 808-813.
  13. Rames C. Panda, Introduction to PID Controllers: Theory, Tuning and Application to Frontier Areas, BoD-Books on Demand, 2012.
  14. MA Ebrahim, KA El-Metwally, FM Bendary, WM Mansour, HS Ramadan, R Ortega, J Romero, Optimization of proportional-integral-differential controller for wind power plant using particle swarm optimization technique, Int. J. Electr. Power Eng. 6 (1) (2012) 32-37. https://doi.org/10.3923/ijepe.2012.32.37
  15. Lezhen Shi, Xiaodong Miao, Hua Wang, An improved nonlinear proportional-integral-differential controller combined with fractional operator and symbolic adaptation algorithm, Trans. Inst. Meas. Control 42 (5) (2020) 927-941. https://doi.org/10.1177/0142331219879332
  16. Stuart Bennett, Development of the PID controller, IEEE Control Syst. Mag. 13 (6) (1993) 58-62. https://doi.org/10.1109/37.248006
  17. Kit-Sang Tang, Kim Fung Man, Guanrong Chen, Sam Kwong, An optimal fuzzy PID controller, IEEE Trans. Ind. Electron. 48 (4) (2001) 757-765. https://doi.org/10.1109/41.937407
  18. Pritesh Shah, Sudhir Agashe, Review of fractional PID controller, Mechatronics 38 (2016) 29-41. https://doi.org/10.1016/j.mechatronics.2016.06.005
  19. Kelvin T. Erickson, Programmable logic controllers, IEEE Potentials 15 (1) (1996) 14-17. https://doi.org/10.1109/45.481370
  20. William Bolton, Programmable Logic Controllers, Newnes, 2015.
  21. Ephrem Ryan Alphonsus, Mohammad Omar Abdullah, A review on the applications of programmable logic controllers (PLCs), Renew. Sustain. Energy Rev. 60 (2016) 1185-1205. https://doi.org/10.1016/j.rser.2016.01.025
  22. Gary A. Dunning, Introduction to Programmable Logic Controllers, Cengage Learning, 2005.
  23. Eric Monmasson, Marcian N. Cirstea, FPGA design methodology for industrial control systems-A review, IEEE Trans. Ind. Electron. 54 (4) (2007) 1824-1842. https://doi.org/10.1109/TIE.2007.898281
  24. Ian Kuon, Russell Tessier, Jonathan Rose, et al., FPGA architecture: Survey and challenges, Found. Trends® Electron. Des. Autom. 2 (2) (2008) 135-253. https://doi.org/10.1561/1000000005
  25. Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H Anderson, Stephen Brown, Tomasz Czajkowski, LegUp: high-level synthesis for FPGA-based processor/accelerator systems, in: Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2011, pp. 33-36.
  26. Teng Lei, Wang Shuai, Wang Xiaobing, Design of high pressure water jet decontamination device with pressure and flow synchronous control function, Nucl. Power Eng. 41 (3) (2020) 153-157.
  27. Jonghyun Kim, Seungjun Lee, Poong Hyun Seong, Autonomous Nuclear Power Plants with Artificial Intelligence, Vol. 94, Springer Nature, 2023.
  28. Jurgen Schmidhuber, Deep learning in neural networks: An overview, Neural Netw. 61 (2015) 85-117.
  29. Yann LeCun, Yoshua Bengio, Geoffrey Hinton, Deep learning, Nature 521 (7553) (2015) 436-444. https://doi.org/10.1038/nature14539
  30. Li Deng, Dong Yu, et al., Deep learning: methods and applications, Found. Trends® Signal Process. 7 (3-4) (2014) 197-387. https://doi.org/10.1561/2000000039
  31. Pramila P. Shinde, Seema Shah, A review of machine learning and deep learning applications, in: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), IEEE, 2018, pp. 1-6.
  32. Ajeet Ram Pathak, Manjusha Pandey, Siddharth Rautaray, Application of deep learning for object detection, Procedia Comput. Sci. 132 (2018) 1706-1717. https://doi.org/10.1016/j.procs.2018.05.144
  33. Zongmei Gao, Zhongwei Luo, Wen Zhang, Zhenzhen Lv, Yanlei Xu, Deep learning application in plant stress imaging: a review, AgriEngineering 2 (3) (2020) 29.
  34. Samir Khan, Takehisa Yairi, A review on the application of deep learning in system health management, Mech. Syst. Signal Process. 107 (2018) 241-265. https://doi.org/10.1016/j.ymssp.2017.11.024
  35. Richard S. Sutton, Andrew G. Barto, Reinforcement Learning: An Introduction, MIT Press, 2018.
  36. Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore, Reinforcement learning: A survey, J. Artif. Intell. Res. 4 (1996) 237-285. https://doi.org/10.1613/jair.301
  37. Yuxi Li, Deep reinforcement learning: An overview, 2017, arXiv preprint arXiv: 1701.07274.
  38. James Ladyman, James Lambert, Karoline Wiesner, What is a complex system? Eur. J. Philos. Sci. 3 (1) (2013) 33-67. https://doi.org/10.1007/s13194-012-0056-8
  39. Takayuki Kanda, Hiroshi Ishiguro, Michita Imai, Tetsuo Ono, Development and evaluation of interactive humanoid robots, Proc. IEEE 92 (11) (2004) 1839-1850. https://doi.org/10.1109/JPROC.2004.835359
  40. Bradley T. Werner, Complexity in natural landform patterns, Science 284 (5411) (1999) 102-104. https://doi.org/10.1126/science.284.5411.102
  41. Klaus Mainzer, A. John Mallinckrodt, Thinking in complexity: The complex dynamics of matter, mind, and mankind, Comput. Phys. 9 (4) (1995) 398.
  42. Robert S. MacKay, Nonlinearity in complexity science, Nonlinearity 21 (12) (2008) T273.
  43. Alicia Juarrero, Dynamics in action: Intentional behavior as a complex system, Emergence 2 (2) (2000) 24-57. https://doi.org/10.1207/S15327000EM0202_03
  44. Sven Bertelsen, Construction as a complex system, in: Proceedings for the 11th Annual Conference of the International Group for Lean Construction, 2003, pp. 11-23.
  45. Steven D. Gribble, Robustness in complex systems, in: Proceedings Eighth Workshop on Hot Topics in Operating Systems, IEEE, 2001, pp. 21-26.
  46. Herbert A. Simon, The architecture of complexity, in: Facets of Systems Science, Springer, 1991, pp. 457-476.
  47. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al., Human-level control through deep reinforcement learning, Nature 518 (7540) (2015) 529-533. https://doi.org/10.1038/nature14236
  48. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al., Mastering the game of go without human knowledge, Nature 550 (7676) (2017) 354-359. https://doi.org/10.1038/nature24270
  49. Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra, Continuous control with deep reinforcement learning, 2015, arXiv preprint arXiv:1509.02971.
  50. Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel, End-to-end training of deep visuomotor policies, J. Mach. Learn. Res. 17 (1) (2016) 1334-1373.
  51. Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michael Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature 575 (7782) (2019) 350-354. https://doi.org/10.1038/s41586-019-1724-z
  52. Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T Connor, Neil Burch, Thomas Anthony, et al., Mastering the game of stratego with model-free multiagent reinforcement learning, 2022, arXiv e-prints, arXiv-2206.
  53. Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement learning in robotics: A survey, Int. J. Robot. Res. 32 (11) (2013) 1238-1274. https://doi.org/10.1177/0278364913495721
  54. Petar Kormushev, Sylvain Calinon, Darwin G. Caldwell, Reinforcement learning in robotics: Applications and real-world challenges, Robotics 2 (3) (2013) 122-148. https://doi.org/10.3390/robotics2030122
  55. Athanasios S. Polydoros, Lazaros Nalpantidis, Survey of model-based reinforcement learning: Applications on robotics, J. Intell. Robot. Syst. 86 (2) (2017) 153-173. https://doi.org/10.1007/s10846-017-0468-y
  56. Wenshuai Zhao, Jorge Pena Queralta, Tomi Westerlund, Sim-to-real transfer in deep reinforcement learning for robotics: a survey, in: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE, 2020, pp. 737-744.
  57. Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, Xindong Wu, Object detection with deep learning: A review, IEEE Trans. Neural Netw. Learn. Syst. 30 (11) (2019) 3212-3232. https://doi.org/10.1109/TNNLS.2018.2876865
  58. Md Tahmid Hasan Fuad, Awal Ahmed Fime, Delowar Sikder, Md Akil Raihan Iftee, Jakaria Rabbi, Mabrook S Al-Rakhami, Abdu Gumaei, Ovishake Sen, Mohtasim Fuad, Md Nazrul Islam, Recent advances in deep learning techniques for face recognition, IEEE Access 9 (2021) 99112-99142. https://doi.org/10.1109/ACCESS.2021.3096136
  59. Dinggang Shen, Guorong Wu, Heung-Il Suk, Deep learning in medical image analysis, Annu. Rev. Biomed. Eng. 19 (2017) 221.
  60. Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, Clara I Sanchez, A survey on deep learning in medical image analysis, Med. Image Anal. 42 (2017) 60-88. https://doi.org/10.1016/j.media.2017.07.005
  61. Afan Galih Salman, Bayu Kanigoro, Yaya Heryadi, Weather forecasting using deep learning techniques, in: 2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS), IEEE, 2015, pp. 281-285.
  62. Tom Young, Devamanyu Hazarika, Soujanya Poria, Erik Cambria, Recent trends in deep learning based natural language processing, IEEE Comput. Intell. Mag. 13 (3) (2018) 55-75. https://doi.org/10.1109/MCI.2018.2840738
  63. Yushi Chen, Zhouhan Lin, Xing Zhao, Gang Wang, Yanfeng Gu, Deep learning-based classification of hyperspectral data, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 7 (6) (2014) 2094-2107. https://doi.org/10.1109/JSTARS.2014.2329330
  64. Zidong Zhang, Dongxia Zhang, Robert C. Qiu, Deep reinforcement learning for power system applications: An overview, CSEE J. Power Energy Syst. 6 (1) (2019) 213-225.
  65. Nguyen Cong Luong, Dinh Thai Hoang, Shimin Gong, Dusit Niyato, Ping Wang, Ying-Chang Liang, Dong In Kim, Applications of deep reinforcement learning in communications and networking: A survey, IEEE Commun. Surv. Tutor. 21 (4) (2019) 3133-3174. https://doi.org/10.1109/COMST.2019.2916583
  66. Hao-nan Wang, Ning Liu, Yi-yun Zhang, Da-wei Feng, Feng Huang, Dong-sheng Li, Yi-ming Zhang, Deep reinforcement learning: a survey, Front. Inf. Technol. Electron. Eng. 21 (12) (2020) 1726-1744. https://doi.org/10.1631/FITEE.1900533
  67. Amirhosein Mosavi, Yaser Faghan, Pedram Ghamisi, Puhong Duan, Sina Faizol-lahzadeh Ardabili, Ely Salwana, Shahab S Band, Comprehensive review of deep reinforcement learning methods and applications in economics, Mathematics 8 (10) (2020) 1640.
  68. Ammar Haydari, Yasin Yilmaz, Deep reinforcement learning for intelligent transportation systems: A survey, IEEE Trans. Intell. Transp. Syst. (2020).
  69. Christopher John Cornish Hellaby Watkins, Learning from delayed rewards, 1989.
  70. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller, Playing atari with deep reinforcement learning, 2013, arXiv preprint arXiv:1312.5602.
  71. Leemon Baird, Residual algorithms: Reinforcement learning with function approximation, in: Machine Learning Proceedings 1995, Elsevier, 1995, pp. 30-37.
  72. John Tsitsiklis, Benjamin Van Roy, Analysis of temporal-diffference learning with function approximation, Adv. Neural Inf. Process. Syst. 9 (1996).
  73. Long-Ji Lin, Reinforcement Learning for Robots using Neural Networks, Carnegie Mellon University, 1992.
  74. Hado Hasselt, Double Q-learning, Adv. Neural Inf. Process. Syst. 23 (2010).
  75. Hado Van Hasselt, Arthur Guez, David Silver, Deep reinforcement learning with double q-learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30, 2016.
  76. Richard S Sutton, David McAllester, Satinder Singh, Yishay Mansour, Policy gradient methods for reinforcement learning with function approximation, Adv. Neural Inf. Process. Syst. 12 (1999).
  77. John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel, High-dimensional continuous control using generalized advantage estimation, 2015, arXiv preprint arXiv:1506.02438.
  78. Ivaylo Popov, Nicolas Heess, Timothy Lillicrap, Roland Hafner, Gabriel Barth-Maron, Matej Vecerik, Thomas Lampe, Yuval Tassa, Tom Erez, Martin Riedmiller, Data-efficient deep reinforcement learning for dexterous manipulation, 2017, arXiv preprint arXiv:1704.03073.
  79. John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, Philipp Moritz, Trust region policy optimization, in: International Conference on Machine Learning, PMLR, 2015, pp. 1889-1897.
  80. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, Proximal policy optimization algorithms, 2017, arXiv preprint arXiv:1707.06347.
  81. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous methods for deep reinforcement learning, in: International Conference on Machine Learning, PMLR, 2016, pp. 1928-1937.
  82. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, in: International Conference on Machine Learning, PMLR, 2018, pp. 1861-1870.
  83. Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, Martin Riedmiller, Maximum a posteriori policy optimisation, 2018, arXiv preprint arXiv:1806.06920.
  84. Scott Fujimoto, David Meger, Doina Precup, Off-policy deep reinforcement learning without exploration, in: International Conference on Machine Learning, 2019.
  85. Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine, Conservative q-learning for offline reinforcement learning, Adv. Neural Inf. Process. Syst. 33 (2020) 1179-1191.
  86. Jiafei Lyu, Xiaoteng Ma, Xiu Li, Zongqing Lu, Mildly conservative Q-learning for offline reinforcement learning, in: Thirty-Sixth Conference on Neural Information Processing Systems, 2022.
  87. Scott Fujimoto, Shixiang Shane Gu, A minimalist approach to offline reinforcement learning, in: Advances in Neural Information Processing Systems, 2021.
  88. Aviral Kumar, Justin Fu, G. Tucker, Sergey Levine, Stabilizing off-policy Q-learning via bootstrapping error reduction, in: Neural Information Processing Systems, 2019.
  89. Jiafei Lyu, aicheng Gong, Le Wan, Zongqing Lu, Xiu Li, State advantage weighting for offline RL, in: 3rd Offline RL Workshop: Offline RL As a ''Launchpad'', 2022.
  90. Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine, When to trust your model: Model-based policy optimization, 2019, ArXiv, abs/1906.08253.
  91. Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Y. Zou, Sergey Levine, Chelsea Finn, Tengyu Ma, MOPO: Model-based offline policy optimization, in: Advances in Neural Information Processing Systems, 2020.
  92. Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn, COMBO: Conservative offline model-based policy optimization, in: Neural Information Processing Systems, 2021.
  93. Jiafei Lyu, Xiu Li, Zongqing Lu, Double check your state before trusting it: Confidence-aware bidirectional offline model-based imagination, 2022, arXiv preprint arXiv:2206.07989.
  94. Junjie Zhang, Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Jun Yang, Le Wan, Xiu Li, Uncertainty-driven trajectory truncation for model-based offline reinforcement learning, 2023, ArXiv, abs/2304.04660.
  95. Zhongjian Qiao, Jiafei Lyu, Xiu Li, The primacy bias in model-based RL, 2023, ArXiv, abs/2310.15017.
  96. Marc Rigter, Bruno Lacerda, Nick Hawes, RAMBO-RL: Robust adversarial model-based offline reinforcement learning, 2022, ArXiv, abs/2204.12581.
  97. Suyang Zhou, Zijian Hu, Wei Gu, Meng Jiang, Meng Chen, Qiteng Hong, Campbell Booth, Combined heat and power system intelligent economic dispatch: A deep reinforcement learning approach, Int. J. Electr. Power Energy Syst. 120 (2020) 106016.
  98. Esmat Samadi, Ali Badri, Reza Ebrahimpour, Decentralized multi-agent based energy management of microgrid using reinforcement learning, Int. J. Electr. Power Energy Syst. 122 (2020) 106211.
  99. Hussain Kazmi, Johan Suykens, Attila Balint, Johan Driesen, Multi-agent reinforcement learning for modeling and control of thermostatically controlled loads, Appl. Energy 238 (2019) 1022-1035. https://doi.org/10.1016/j.apenergy.2019.01.140
  100. Tianshu Wei, Yanzhi Wang, Qi Zhu, Deep reinforcement learning for building HVAC control, in: Proceedings of the 54th Annual Design Automation Conference 2017, 2017, pp. 1-6.
  101. Eunsung Oh, Hanho Wang, Reinforcement-learning-based energy storage system operation strategies to manage wind power forecast uncertainty, IEEE Access 8 (2020) 20965-20976. https://doi.org/10.1109/ACCESS.2020.2968841
  102. Chun Wei, Zhe Zhang, Wei Qiao, Liyan Qu, Reinforcement-learning-based intelligent maximum power point tracking control for wind energy conversion systems, IEEE Trans. Ind. Electron. 62 (10) (2015) 6360-6370. https://doi.org/10.1109/TIE.2015.2420792
  103. Huifeng Zhang, Dong Yue, Chunxia Dou, Kang Li, Gerhard P Hancke, Two-step wind power prediction approach with improved complementary ensemble empirical mode decomposition and reinforcement learning, IEEE Syst. J. (2021).
  104. Yinliang Xu, Wei Zhang, Wenxin Liu, Frank Ferrese, Multiagent-based reinforcement learning for optimal reactive power dispatch, IEEE Trans. Syst. Man Cybern. C 42 (6) (2012) 1742-1751. https://doi.org/10.1109/TSMCC.2012.2218596
  105. John G. Vlachogiannis, Nikos D. Hatziargyriou, Reinforcement learning for reactive power control, IEEE Trans. Power Syst. 19 (3) (2004) 1317-1325. https://doi.org/10.1109/TPWRS.2004.831259
  106. Tao Yu, Bin Zhou, Ka Wing Chan, Liang Chen, Bo Yang, Stochastic optimal relaxed automatic generation control in non-Markov environment based on multi-step q learning, IEEE Trans. Power Syst. 26 (3) (2011) 1272-1282. https://doi.org/10.1109/TPWRS.2010.2102372
  107. Fatheme Daneshfar, Hassan Bevrani, Load-frequency control: a GA-based multi-agent reinforcement learning, IET Gener. Transm. Distrib. 4 (1) (2010) 13-26. https://doi.org/10.1049/iet-gtd.2009.0168
  108. T.P. Imthias Ahamed, P.S. Nagendra Rao, P.S. Sastry, A reinforcement learning approach to automatic generation control, Electr. Power Syst. Res. 63 (1) (2002) 9-26. https://doi.org/10.1016/S0378-7796(02)00088-3
  109. Tao Yu, B Zhou, Ka Wing Chan, Y Yuan, Bo Yang, QH Wu, R (λ) imitation learning for automatic generation control of interconnected power grids, Automatica 48 (9) (2012) 2130-2136. https://doi.org/10.1016/j.automatica.2012.05.043
  110. Yujian Ye, Dawei Qiu, Mingyang Sun, Dimitrios Papadaskalopoulos, Goran Strbac, Deep reinforcement learning for strategic bidding in electricity markets, IEEE Trans. Smart Grid 11 (2) (2019) 1343-1355. https://doi.org/10.1109/TSG.2019.2936142
  111. Sina Zarrabian, Rabie Belkacemi, Adeniyi A. Babalola, Reinforcement learning approach for congestion management and cascading failure prevention with experimental application, Electr. Power Syst. Res. 141 (2016) 179-190. https://doi.org/10.1016/j.epsr.2016.06.041
  112. Xiaorui Liu, Charalambos Konstantinou, Reinforcement learning for cyber-physical security assessment of power systems, in: 2019 IEEE Milan PowerTech, IEEE, 2019, pp. 1-6.
  113. Yin Cheng, Yuexin Huang, Bo Pang, Weidong Zhang, ThermalNet: A deep reinforcement learning-based combustion optimization system for coal-fired boiler, Eng. Appl. Artif. Intell. 74 (2018) 303-311. https://doi.org/10.1016/j.engappai.2018.07.003
  114. Jigao Fu, Hong Xiao, Hao Wang, Junhao Zhou, Control strategy for denitrification efficiency of coal-fired power plant based on deep reinforcement learning, IEEE Access 8 (2020) 65127-65136. https://doi.org/10.1109/ACCESS.2020.2985233
  115. Xianyuan Zhan, Haoran Xu, Yue Zhang, Yusen Huo, Xiangyu Zhu, Honglei Yin, Yu Zheng, Deepthermal: Combustion optimization for thermal power generating units using offline reinforcement learning, 2021, arXiv preprint arXiv:2102.11492.
  116. Volker Stephan, Klaus Debes, H-M Gross, F Wintrich, H Wintrich, A new control scheme for combustion processes using reinforcement learning based on neural networks, Int. J. Comput. Intell. Appl. 1 (02) (2001) 121-136. https://doi.org/10.1142/S1469026801000172
  117. OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al., Learning dexterous in-hand manipulation, Int. J. Robot. Res. 39 (1) (2020) 3-20. https://doi.org/10.1177/0278364919887447
  118. Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, et al., Solving rubik's cube with a robot hand, 2019, arXiv preprint arXiv:1910.07113.
  119. Bianca Sangiovanni, Angelo Rendiniello, Gian Paolo Incremona, Antonella Ferrara, Marco Piastra, Deep reinforcement learning for collision avoidance of robotic manipulators, in: 2018 European Control Conference (ECC), IEEE, 2018, pp. 2063-2068.
  120. Steven A Harp, Sergio Brignone, Bruce F Wollenberg, Tariq Samad, SEPIA. a simulator for electric power industry agents, IEEE Control Syst. Mag. 20 (4) (2000) 53-69. https://doi.org/10.1109/37.856179
  121. Morteza Rahimiyan, Habib Rajabi Mashhadi, An adaptive Q-learning algorithm developed for agent-based computational modeling of electricity market, IEEE Trans. Syst. Man Cybern. C 40 (5) (2010) 547-556. https://doi.org/10.1109/TSMCC.2010.2044174
  122. Vishnuteja Nanduri, Tapas K. Das, A reinforcement learning model to assess market power under auction-based energy pricing, IEEE Trans. Power Syst. 22 (1) (2007) 85-95. https://doi.org/10.1109/TPWRS.2006.888977
  123. Byung-Gook Kim, Yu Zhang, Mihaela Van Der Schaar, Jang-Won Lee, Dynamic pricing and energy consumption scheduling with reinforcement learning, IEEE Trans. Smart Grid 7 (5) (2015) 2187-2198. https://doi.org/10.1109/TSG.2015.2495145
  124. Thilo Krause, Elena Vdovina Beck, Rachid Cherkaoui, Alain Germond, Goran Andersson, Damien Ernst, A comparison of Nash equilibria analysis and agent-based modelling for power markets, Int. J. Electr. Power Energy Syst. 28 (9) (2006) 599-607. https://doi.org/10.1016/j.ijepes.2006.03.002
  125. Sergey Levine, Aviral Kumar, G. Tucker, Justin Fu, Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020, ArXiv, abs/2005.01643.
  126. Aviral Kumar, Aurick Zhou, G. Tucker, Sergey Levine, Conservative Q-learning for offline reinforcement learning, in: Advances in Neural Information Processing Systems, 2020.
  127. Xinyue Chen, Zijian Zhou, Z. Wang, Che Wang, Yanqiu Wu, Qing Deng, Keith W. Ross, BAIL: Best-action imitation learning for batch deep reinforcement learning, in: Advances in Neural Information Processing Systems, 2020.
  128. Ilya Kostrikov, Ashvin Nair, Sergey Levine, Offline reinforcement learning with implicit Q-learning, in: International Conference on Learning Representations, 2022.
  129. Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, P. Abbeel, A. Srinivas, Igor Mordatch, Decision transformer: Reinforcement learning via sequence modeling, 2021, ArXiv, abs/2106.01345.
  130. Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua M. Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh, Uncertainty weighted actor-critic for offline reinforcement learning, in: ICML, 2021.
  131. Zhe Dong, Xiaojin Huang, Yujie Dong, Zuoyi Zhang, Multilayer perception based reinforcement learning supervisory control of energy systems with application to a nuclear steam supply system, Appl. Energy 259 (2020) 114193.
  132. Koroush Shirvan Forgeta, Physics-informed reinforcement learning optimization of nuclear assembly design, 2021.
  133. Xiangyi Chen, Asok Ray, Deep reinforcement learning control of a boiling water reactor, IEEE Trans. Nucl. Sci. 69 (8) (2022) 1820-1832. https://doi.org/10.1109/TNS.2022.3187662
  134. Paul Seurin, Koroush Shirvan, Assessment of reinforcement learning algorithms for nuclear power plant fuel optimization, 2023, arXiv preprint arXiv:2305.05812.
  135. Tianhao Zhang, Zhe Dong, Xiaojin Huang, Multi-objective optimization of thermal power and outlet steam temperature for a nuclear steam supply system with deep reinforcement learning. Available at SSRN 4490266.
  136. JaeKwan Park, TaekKyu Kim, SeungHwan Seong, Providing support to operators for monitoring safety functions using reinforcement learning, Prog. Nucl. Energy 118 (2020) 103123.
  137. Daeil Lee, Hyojin Kim, Younhee Choi, Jonghyun Kim, Development of autonomous operation agent for normal and emergency situations in nuclear power plants, in: 2021 5th International Conference on System Reliability and Safety (ICSRS), IEEE, 2021, pp. 240-247.
  138. Jing Li, Yanyang Liu, Xianguo Qing, Kai Xiao, Ying Zhang, Pengcheng Yang, Yue Marco Yang, The application of deep reinforcement learning in coordinated control of nuclear reactors, in: Journal of Physics: Conference Series, Vol. 2113, IOP Publishing, 2021, 012030.
  139. Jae Min Kim, Junyong Bae, Seung Jun Lee, Strategy to coordinate actions through a plant parameter prediction model during startup operation of a nuclear power plant, Nucl. Eng. Technol. 55 (3) (2023) 839-849. https://doi.org/10.1016/j.net.2022.11.012
  140. Junyong Bae, Jae Min Kim, Seung Jun Lee, Deep reinforcement learning for a multi-objective operation in a nuclear power plant, Nucl. Eng. Technol. (2023).
  141. Jae Min Kim, Seung Jun Lee, Framework of two-level operation module for autonomous system of nuclear power plants during startup and shutdown operation, in: Transactions of the Korean Nuclear Society Autumn Meeting, 2019.
  142. Daeil Lee, Awwal Mohammed Arigi, Jonghyun Kim, Algorithm for autonomous power-increase operation using deep reinforcement learning and a rule-based system, IEEE Access 8 (2020) 196727-196746. https://doi.org/10.1109/ACCESS.2020.3034218
  143. JaeKwan Park, TaekKyu Kim, SeungHwan Seong, SeoRyong Koo, Control automation in the heat-up mode of a nuclear power plant using reinforcement learning, Prog. Nucl. Energy 145 (2022) 104107.
  144. Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de Las Casas, et al., Magnetic control of tokamak plasmas through deep reinforcement learning, Nature 602 (7897) (2022) 414-419. https://doi.org/10.1038/s41586-021-04301-9
  145. K.E. Carlson, P.A. Roth, V.H. Ransom, ATHENA Code Manual. Volume 1. Code Structure, System Models, and Solution Methods, Technical Report, EG and G Idaho, Inc., Idaho Falls (USA), 1986.
  146. Deail Lee, Jonghyun Kim, Autonomous algorithm for start-up operation of nuclear power plants by using LSTM, in: International Conference on Applied Human Factors and Ergonomics, Springer, 2018, pp. 465-475.
  147. Seung Jun Lee, Poong Hyun Seong, Development of automated operating procedure system using fuzzy colored petri nets for nuclear power plants, Ann. Nucl. Energy 31 (8) (2004) 849-869. https://doi.org/10.1016/j.anucene.2003.12.002
  148. Yochan Kima, Jinkyun Park, Envisioning human-automation interactions for responding emergency situations of NPPs: a viewpoint from human-computer interaction, in: Proc. Trans. Korean Nucl. Soc. Autumn Meeting, 2018.
  149. Ar Ryum Kim, Jinkyun Park, Ji Tae Kim, Jaewhan Kim, Poong Hyun Seong, Study on the identification of main drivers affecting the performance of human operators during low power and shutdown operation, Ann. Nucl. Energy 92 (2016) 447-455. https://doi.org/10.1016/j.anucene.2016.02.010
  150. Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker, et al., Model-based reinforcement learning: A survey, Found. Trends® Mach. Learn. 16 (1) (2023) 1-118. https://doi.org/10.1561/2200000086
  151. Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine, When to trust your model: Model-based policy optimization, Adv. Neural Inf. Process. Syst. 32 (2019).
  152. Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew Johnson, Sergey Levine, Solar: Deep structured representations for model-based reinforcement learning, in: International Conference on Machine Learning, PMLR, 2019, pp. 7444-7453.
  153. Sergey Levine, Aviral Kumar, George Tucker, Justin Fu, Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020, arXiv preprint arXiv:2005.01643.
  154. Yifan Wu, George Tucker, Ofir Nachum, Behavior regularized offline reinforcement learning, 2019, arXiv preprint arXiv:1911.11361.
  155. Rasmus E Andersen, Steffen Madsen, Alexander BK Barlo, Sebastian B Johansen, Morten Nor, Rasmus S Andersen, Simon Bogh, Self-learning processes in smart factories: Deep reinforcement learning for process control of robot brine injection, Procedia Manuf. 38 (2019) 171-177. https://doi.org/10.1016/j.promfg.2020.01.023
  156. Steven Spielberg, Aditya Tulsyan, Nathan P Lawrence, Philip D Loewen, R Bhushan Gopaluni, Deep reinforcement learning for process control: A primer for beginners, 2020, arXiv preprint arXiv:2004.05490. https://doi.org/10.1002/aic.16689
  157. Daeil Lee, Seoryong Koo, Inseok Jang, Jonghyun Kim, Comparison of deep reinforcement learning and PID controllers for automatic cold shutdown operation, Energies 15 (8) (2022) 2834.
  158. Jiafei Lyu, Le Wan, Zongqing Lu, Xiu Li, Off-policy RL algorithms can be sample-efficient for continuous control via sample multiple reuse, 2023, arXiv preprint arXiv:2305.18443.
  159. Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Xiu Li, Efficient continuous control with double actors and regularized critics, in: AAAI Conference on Artificial Intelligence, 2021.
  160. Daeil Lee Hee-Jae Lee, Jonghyun Kim, Anomaly recovery algorithm based on robust AI concept for nuclear power plants, 2023.
  161. Daeil Lee, Jonghyun Kim, Concept of robust AI with meta-learning for accident diagnosis, 2022.
  162. Jonah Siekmann, Kevin R. Green, John Warila, Alan Fern, Jonathan W. Hurst, Blind bipedal stair traversal via sim-to-real reinforcement learning, 2021, ArXiv, abs/2105.08328.
  163. Jan Matas, Stephen James, Andrew J. Davison, Sim-to-real reinforcement learning for deformable object manipulation, 2018, ArXiv, abs/1806.07851.
  164. Josiah P. Hanna, Siddharth Desai, Haresh Karnan, Garrett A. Warnell, Peter Stone, Grounded action transformation for sim-to-real reinforcement learning, Mach. Learn. 110 (2021) 2469-2499. https://doi.org/10.1007/s10994-021-05982-z
  165. Wenshuai Zhao, Jorge Pena Queralta, Tomi Westerlund, Sim-to-real transfer in deep reinforcement learning for robotics: a survey, in: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 2020, pp. 737-744.
  166. Han Hu, Kaicheng Zhang, Aaron Hao Tan, Michael Ruan, Christopher Agia, Goldie Nejat, A sim-to-real pipeline for deep reinforcement learning for autonomous robot navigation in cluttered rough terrain, IEEE Robot. Autom. Lett. 6 (2021) 6569-6576. https://doi.org/10.1109/LRA.2021.3093551
  167. Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel, Sim-to-real transfer of robotic control with dynamics randomization, in: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2018, pp. 3803-3810.
  168. Stephen James, Paul Wohlhart, Mrinal Kalakrishnan, Dmitry Kalashnikov, Alex Irpan, Julian Ibarz, Sergey Levine, Raia Hadsell, Konstantinos Bousmalis, Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12627-12637.
  169. Karol Arndt, Murtaza Hazara, Ali Ghadirzadeh, Ville Kyrki, Meta reinforcement learning for sim-to-real domain adaptation, in: 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 2725-2731.
  170. Jeong-Hoon Lee, Jongeun Choi, Attaining interpretability in reinforcement learning via hierarchical primitive composition, 2021, ArXiv, abs/2110.01833.
  171. Caihua Shan, Yifei Shen, Yao Zhang, Xiang Li, Dongsheng Li, Reinforcement learning enhanced explainer for graph neural networks, in: NeurIPS, 2021.
  172. Francis Maes, Raphael Fonteneau, Louis Wehenkel, Damien Ernst, Policy search in a space of simple closed-form formulas: Towards interpretability of reinforcement learning, in: Discovery Science, 2012.
  173. Nicklas Hansen, Xiaolong Wang, Generalization in reinforcement learning by soft data augmentation, in: 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 13611-13617.
  174. Roberta Raileanu, Rob Fergus, Decoupling value and policy for generalization in reinforcement learning, 2021, ArXiv, abs/2102.10330.
  175. Roberta Raileanu, Maxwell Goldstein, Denis Yarats, Ilya Kostrikov, Rob Fergus, Automatic data augmentation for generalization in reinforcement learning, in: NeurIPS, 2021.
  176. Sam Witty, Jun Ki Lee, Emma Tosch, Akanksha Atrey, Michael L. Littman, David D. Jensen, Measuring and characterizing generalization in deep reinforcement learning, 2021, ArXiv, abs/1812.02868.
  177. Karl Cobbe, Oleg Klimov, Christopher Hesse, Taehoon Kim, John Schulman, Quantifying generalization in reinforcement learning, 2019, ArXiv, abs/1812.02341.