DOI QR코드

DOI QR Code

Machine learning in survival analysis

생존분석에서의 기계학습

  • Baik, Jaiwook (Department of Statistics.Data Science, Korea National Open University)
  • 백재욱 (한국방송통신대학교 통계.데이터과학과)
  • Received : 2022.01.04
  • Accepted : 2022.01.20
  • Published : 2022.01.31

Abstract

We investigated various types of machine learning methods that can be applied to censored data. Exploratory data analysis reveals the distribution of each feature, relationships among features. Next, classification problem has been set up where the dependent variable is death_event while the rest of the features are independent variables. After applying various machine learning methods to the data, it has been found that just like many other reports from the artificial intelligence arena random forest performs better than logistic regression. But recently well performed artificial neural network and gradient boost do not perform as expected due to the lack of data. Finally Kaplan-Meier and Cox proportional hazard model have been employed to explore the relationship of the dependent variable (ti, δi) with the independent variables. Also random forest which is used in machine learning has been applied to the survival analysis with censored data.

본 논문은 중도중단 데이터가 포함된 생존데이터의 경우 적용할 수 있는 기계학습 방법에 대해 살펴보았다. 우선 탐색적인 자료분석으로 각 특성에 대한 분포, 여러 특성들 간의 관계 및 중요도 순위를 파악할 수 있었다. 다음으로 독립변수에 해당하는 여러 특성들과 종속변수에 해당하는 특성(사망여부) 간의 관계를 분류문제로 보고 logistic regression, K nearest neighbor 등의 기계학습 방법들을 적용해본 결과 적은 수의 데이터이지만 통상적인 기계학습 결과에서와 같이 logistic regression보다는 random forest가 성능이 더 좋게 나왔다. 하지만 근래에 성능이 좋다고 하는 artificial neural network나 gradient boost와 같은 기계학습 방법은 성능이 월등히 좋게 나오지 않았는데, 그 이유는 주어진 데이터가 빅데이터가 아니기 때문인 것으로 판명된다. 마지막으로 Kaplan-Meier나 Cox의 비례위험모델과 같은 통상적인 생존분석 방법을 적용하여 어떤 독립변수가 종속변수 (ti, δi)에 결정적인 영향을 미치는지 살펴볼 수 있었으며, 기계학습 방법에 속하는 random forest를 중도중단 데이터가 포함된 생존데이터에도 적용하여 성능을 평가할 수 있었다.

Keywords

Acknowledgement

이 논문은 2020년 한국방송통신대학교 학술연구비지원을 받아 작성된 것임

References

  1. Chung, C., Schmidt, P. and Witte, A. (1991). Survival analysis: A survey. Journal of Quantitative Criminology, 7(1), 59-98. https://doi.org/10.1007/BF01083132
  2. Kleinbaum, D. G. and Klein, M. (2006). Survival analysis: A self-learning text. Springer Science & Business Media.
  3. Wang, P., Li, Y. and Reddy, C. K. (2019). Machine learning for survival analysis: A Survey. ACM computing Surveys, 51(6), 1-36. https://doi.org/10.1145/3214306
  4. Cruz, J. A. and Wishart, D. S. (2006). Applications of machine learning in cancer prediction and prognosis. Cancer informatics, 2.
  5. Kourou K. et al. (2015). Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, 13, 8-17. https://doi.org/10.1016/j.csbj.2014.11.005
  6. WHO. (2016). Fact sheet on CVDs. Gloval Hearts. World Health Organization.
  7. Ahmad, T. et al. (2017). Survival analysis of heart failure patients: A case study. PloS ONE, 12(7).
  8. Chicco, D and Jurman, G. (2020). Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics and Decision Making 20(16).
  9. Al'Aref S. J. et al. (2019). Clinical applications of machine learning in cardiovascular disease and its relevance to cardiac imaging. European Heart Journal, 40(24), 1975-1986. https://doi.org/10.1093/eurheartj/ehy404
  10. Dunn W. B. et al. (2007). Serum metabolomics reveals many novel metabolic markers of heart failure, including pseudouridine and 2-oxoglutarate. Metabolomics, 3(4), 413-426. https://doi.org/10.1007/s11306-007-0063-5
  11. Ambale-Venkatesh B. et al. (2017). Cardiovascular event prediction by machine learning: the multi-ethnic study of atherosclerosis. Circr Res., 121(9), 1092-1101. https://doi.org/10.1161/CIRCRESAHA.117.311312
  12. Panahiazar M. et al. (2015). Using EHRs and machine learning for heart failure survival analysis. Stud Health Technol Informat., 216, 40-44.
  13. Ahmad T. et al. (2018). Machine learning methods improve prognostication, identify clinically distinct phenotypes, and detect heterogeneity in response to therapy in a large cohort of heart failure patients. Journal of American Heart Association, 7(8).
  14. Krittanawong C. et al. (2019). Deep learning for cardiovascular medicine: A practical primer. European Heart Journal, 40, 2058-2073. https://doi.org/10.1093/eurheartj/ehz056
  15. Bello G. A. et al. (2019). Deep learning cardiac motion analysis for human survival prediction. Nature Machine Intelligence, 1, 95-104. https://doi.org/10.1038/s42256-019-0019-2
  16. Ishwaran, H. et al. (2008). Random Survival Forests. The Annuals of Applied Statistics, 2, 841-860.