DOI QR코드

DOI QR Code

공공빅데이터를 활용한 기계학습 기반 뇌졸중 위험도 예측

Machine Learning-based Stroke Risk Prediction using Public Big Data

  • 정선우 (전남대학교 ICT융합시스템공학과) ;
  • 이민지 (전남대학교 IoT인공지능융합전공) ;
  • 유선용 (전남대학교 ICT융합시스템공학과)
  • Jeong, Sunwoo (Dept. of ICT Convergence System Engineering, Chonnam Nation University) ;
  • Lee, Minji (IoT Artificial Intelligence, Chonnam National University) ;
  • Yoo, Sunyong (Dept. of ICT Convergence System Engineering, Chonnam Nation University)
  • 투고 : 2021.01.19
  • 심사 : 2021.02.15
  • 발행 : 2021.02.28

초록

본 논문은 빅데이터를 이용하여 심방세동 환자의 뇌졸중 발병을 예측하는 기계 학습 모델을 제시한다. 학습 데이터로는 국민 건강 보험공단에서 제공하는 대한민국 전수에 해당하는 심방세동 환자의 정보를 수집하였다. 수집된 정보는 인구사회학, 과거 병력, 건강검진을 포함한 68개 독립변수로 구성된다. 본 연구의 목표는 기존 심방세동 환자의 뇌졸중 위험도 예측에 사용되던 통계적 모델 (CHADS2, CHA2DS2-VASc)의 성능을 검증하고 기계 학습 모델을 적용하여 기존 모델보다 높은 정확도를 가지는 모델을 제시하는 것이다. 제안하는 모델의 정확도, AUROC (area under the receiver operating characteristic)를 검증한 결과 제안하는 기계 학습 기반의 모형이 심방세동 환자의 뇌졸중 위험도를 사용한 모델이 기존의 통계적 모델보다 높은 정확도, 민감도, 특이도를 가지는 것을 확인할 수 있었다.

This paper presents a machine learning model that predicts stroke risks in atrial fibrillation patients using public big data. As the training data, 68 independent variables including demographic, medical history, health examination were collected from the Korean National Health Insurance Service. To predict stroke incidence in patients with atrial fibrillation, we applied deep neural network. We firstly verify the performance of conventional statistical models (CHADS2, CHA2DS2-VASc). Then we compared proposed model with the statistical models for various hyperparameters. Accuracy and area under the receiver operating characteristic (AUROC) were mainly used as indicators for performance evaluation. As a result, the model using batch normalization showed the highest performance, which recorded better performance than the statistical model.

키워드

참고문헌

  1. Global Health Estimates: Life expectancy and leading causes of death and disability [Internet]. Available: https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates
  2. World Health Organization (2005). WHO STEPS Stroke Manual [Internet]. Available: http://whqlibdoc.who.int/chp/steps/Stroke/en/
  3. Korean Statistical Information Service (KOSIS). Annual Report on the Cause of Death Statistics [Internet]. 2016. Available: https://kosis.kr/eng/search/searchList.do
  4. Stroke Risk in Atrial Fibrillation Working Group, "Independent predictors of stroke in patients with atrial fibrillation: a systematic review," Neurology, Vol. 69, No. 6, pp. 546-554, Aug. 2007. https://doi.org/10.1212/01.wnl.0000267275.68538.8d
  5. J. B. Olesen, C. Torp-Pedersen, M. L. Hansen, and G. Y. H. Lip, "The value of the CHA2DS2-VASc score for refining stroke risk stratification in patients with atrial fibrillation with a CHADS2 score 0 - 1: a nationwide cohort study," Thrombosis and Haemostasis, Vol. 107, No. 6, pp. 1172-1179, 2012. https://doi.org/10.1160/TH12-03-0175
  6. Y. Bengio, A. Courville, and P. Vincent, "Representation learning: A review and new perspectives," IEEE Ttransactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 8, pp. 1798-1828, Aug. 2013. https://doi.org/10.1109/TPAMI.2013.50
  7. J. Schmidhuber, "Deep learning in neural networks: An Overview", Neural Networks, Vol. 61, pp. 85-117, Jan. 2015. https://doi.org/10.1016/j.neunet.2014.09.003
  8. M. M. Lau and K. Hann Lim, "Review of adaptive activation function in deep neural network," in Proceedings of the 2018 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), Sarawak: Malaysia, pp. 686-690, 2018.
  9. Q. V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Y. Ng, "On optimization methods for deep learning," in Proceedings of the 28th International Conference on Machine Learning, Bellevue: WA, pp. 265-272, Jun. 2011.
  10. D. M. Powers, "Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation," International Journal of Machine Learning Technology, Vol. 2, No. 1, pp. 37-63, 2011.
  11. J. A. Hanley and J. M. Barbara, "The meaning and use of the area under a receiver operating characteristic (ROC) curve," Radiology. Vol. 143, No. 1, pp. 29-36, 1982. https://doi.org/10.1148/radiology.143.1.7063747
  12. A. P. Bradley, "The use of the area under the ROC curve in the evaluation of machine learning algorithms," Pattern Recognition, Vol. 30, No. 7, pp. 1145-1159, 1997. https://doi.org/10.1016/S0031-3203(96)00142-2
  13. J. Keilwagen, I. grosse, and J. Grau, "Area under precision-recall curves for weighted and unweighted data", PloS One, Vol. 9, No. 3, Mar. 2014.
  14. J. Davis, and M. Goadrich., "The relationship between precision-recall and ROC curves.", in Proceedings of the 23rd international conference on Machine learning, New York: NY, pp. 233-240, Jun 2006.