DOI QR코드

DOI QR Code

Application of Deep Recurrent Q Network with Dueling Architecture for Optimal Sepsis Treatment Policy

  • Received : 2021.03.04
  • Accepted : 2021.05.25
  • Published : 2021.06.30

Abstract

Sepsis is one of the leading causes of mortality globally, and it costs billions of dollars annually. However, treating septic patients is currently highly challenging, and more research is needed into a general treatment method for sepsis. Therefore, in this work, we propose a reinforcement learning method for learning the optimal treatment strategies for septic patients. We model the patient physiological time series data as the input for a deep recurrent Q-network that learns reliable treatment policies. We evaluate our model using an off-policy evaluation method, and the experimental results indicate that it outperforms the physicians' policy, reducing patient mortality up to 3.04%. Thus, our model can be used as a tool to reduce patient mortality by supporting clinicians in making dynamic decisions.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT). (NRF-2020R1A2B5B01002085). This work was supported by the Bio & Medical Technology Development Program of the National Research Foundation (NRF)& funded by the Korean government (MSIT) (NRF-2019M3E5D1A02067961).

References

  1. M. Singer, et al. "The third international consensus definitions for sepsis and septic shock (sepsis-3)," J. Am. Med. Assoc. Vol. 315, No. 8, pp. 801-810, Feb. 2016 https://doi.org/10.1001/jama.2016.0287
  2. Christopher W. Seymour, et al. "Time to treatment and mortality during mandated emergency care for sepsis," New England Journal of Medicine, Vol. 376, No. 23, pp. 2235-2244, Jun. 2017 https://doi.org/10.1056/NEJMoa1703058
  3. L. Byrne, V. Haren, "Fluid resuscitation in human sepsis: time to rewrite history?" Ann Intensive Care, Vol. 7, No. 4, Jan. 2017
  4. P. Marik, R. Bellomo, "A rational approach to fluid therapy in sepsis," Br. J. Anaesth. Vol. 116, No. 3, pp. 339-349, Mar. 2016 https://doi.org/10.1093/bja/aev349
  5. Z. Wang, N. de Freitas, and M. Lanctot, "Dueling network architectures for deep reinforcement learning," arXiv:1511.06581, Nov. 2015
  6. A. Raghu, M. Komorowski, I. Ahmed, L. Celi, P. Szolovits, and M. Ghassemi, "Deep Reinforcement Learning for Sepsis Treatment," CoRR, abs/1711.09602, 2017
  7. X. Peng, Y. Ding, D. Wihl, O. Gottesman, M. Komorowski, L.H. Lehman, Ross, A. Faisal A, F. Doshi-Velez, "Improving Sepsis Treatment Strategies by Combining Deep and Kernel-Based Reinforcement Learning," AMIA Annu Symp Proc, pp. 887-896, Scottsdale, USA, Dec. 2018
  8. Diederik P Kingma, Max Welling, "Auto-encoding variational bayes," CoRR, abs/1312.6114, May 2014
  9. E.W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L.A. Celi, and R.G. Mark. "MIMIC-III, a freely accessible critical care database," Scientific Data, V0l.3, No.160035, May 2016 https://doi.org/10.1038/sdata.2016.35
  10. A. Raghu, M. Komorowski, I. Ahmed, L. Celi, P. Szolovits, and M. Ghassemi, "Continuous state-space models for optimal sepsis treatment - a deep reinforcement learning approach," Proceedings of the 2nd Machine Learning for Healthcare Conference, pp. 147-163, Northeastern University, USA, Aug. 2017
  11. M. Hausknecht and P. Stone, "Deep recurrent q-learning for partially observable mdps," CoRR, abs/1507.06527, Jul. 2015
  12. N. Jiang and L. Li, "Doubly Robust Off-policy Evaluation for Reinforcement Learning," Proceedings of The 33rd International Conference on Machine Learning, Vol. 48, pp. 652-661, NY, USA, Jun. 2016