DOI QR코드

DOI QR Code

A Bayesian Approach for the Analysis of Times to Multiple Events : An Application on Healthcare Data

다사건 시계열 자료 분석을 위한 베이지안 기반의 통계적 접근의 응용

  • Seok, Junhee (School of Electrical Engineering, Korea University) ;
  • Kang, Yeong Seon (Department of Business Administration, University of Seoul)
  • 석준희 (고려대학교 공과대학 전기전자공학부) ;
  • 강영선 (서울시립대학교 경영대학 경영학부)
  • Received : 2014.08.22
  • Accepted : 2014.10.18
  • Published : 2014.11.30

Abstract

Times to multiple events (TMEs) are a major data type in large-scale business and medical data. Despite its importance, the analysis of TME data has not been well studied because of the analysis difficulty from censoring of observation. To address this difficulty, we have developed a Bayesian-based multivariate survival analysis method, which can successfully estimate the joint probability density of survival times. In this work, we extended this method for the analysis of precedence, dependency and causality among multiple events. We applied this method to the electronic health records of 2,111 patients in a children's hospital in the US and the proposed analysis successfully shows the relation between times to two types of hospital visits for different medical issues. The overall result implies the usefulness of the multivariate survival analysis method in large-scale big data in a variety of areas including marketing, human resources, and e-commerce. Lastly, we suggest our future research directions based multivariate survival analysis method.

Keywords

References

  1. 남재우, 이회경, "생존분석 기법을 이용한 기업도산 예측 모형", 한국경영과학회 추계학술대회 논문집, (2000), pp.40-43.
  2. 박재빈, 생존분석 : 이론과 실제, 신광출판사, 2007.
  3. 하성호, 양정원, 민지홍, "코호넨네트워크와 생존분석을 활용한 신용 예측", 한국경영과학회지, 제34권, 제2호(2009), pp.35-54.
  4. Ascarza, E. and B.G. Hardie, "A Joint model of usage and churn in contractual setting," Marketing Science, Vol.32(2013), pp.570-590. https://doi.org/10.1287/mksc.2013.0786
  5. Bandyopadhyay, S., J. Wolfson, D. Vock, G. Vazquez-Benitez, G. Adomavicius, M. Elidrisi, P. Johnson, and P. O'Connor, "Data mining for censored time-to-event data : A Bayesian network model for predicting cardiovascular risk from electronic health record data," working paper, arXiv : 1404.2189 (2014).
  6. Campbell, G. and A. Foldes, "Large sample properties of nonparametric bivariate estimators with censored data," Proceedings, International Colloquium on Nonparametric Statistical Inference, (1982), pp.23-28.
  7. Dabrowska, D., "Kaplan-Meier estimate on the plane," Annals of Statistics, Vol.16(1988), pp.1475-1489. https://doi.org/10.1214/aos/1176351049
  8. DuWors, Jr., R.E. and G.H. Haines, Jr, "Event history analysis measures of brand loyalty," Journal of Marketing Research, Vol.27(1990), pp.485-493. https://doi.org/10.2307/3172633
  9. Efron, B., "The two-sample problem with censored data," Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol.4(1967), pp.831-853.
  10. Granger, C.W.J., "Investigating causal relations by econometric models and crossspectral methods," Econometrica, Vol.37(1969), pp.424-438. https://doi.org/10.2307/1912791
  11. Helsen, K. and D.C. Schmittlein, "Analyzing duration times in marketing : evidence for the effectiveness of hazard rate models," Marketing Science, Vol.11, No.4(1993), pp. 395-414.
  12. Hidalgo, C.A., N. Blumm, and A-L. Barabasi, and N.A. Christakis, "A Dynamic network approach for the study of human phenotypes," PLoS Computational Biology, Vol.5, No.4(2009), e1000353. https://doi.org/10.1371/journal.pcbi.1000353
  13. Kaplan, E.L. and P. Meier, "Nonparametric estimation from incomplete observations," Journal of American Statistical Association, Vol.53(1958), pp.457-481. https://doi.org/10.1080/01621459.1958.10501452
  14. Lariviere, B. and D. Van den Poel, "Investigating the role of product features in preventing customer churn, by using survival analysis and choice modeling : the case of financial services," Expert Systems with Applications, Vol.27(2004), pp.277-285. https://doi.org/10.1016/j.eswa.2004.02.002
  15. Lavine, M., "Some aspects of Polya tree distributions for statistical modeling," Annals of Statistics, Vol.20(1992), pp.1222-1235. https://doi.org/10.1214/aos/1176348767
  16. Lavine, M., "More aspects of Polya tree distributions for statistical modeling," Annals of Statistics, Vol.22(1994), pp.1161-1176. https://doi.org/10.1214/aos/1176325623
  17. Li, S., "Survival Analysis," Marketing Research, Vol.7, No.4(1995), pp.17-23.
  18. Milovic, B. and M. Milovic, "Prediction and decision making in health care using data mining," International Journal of Public Health Science, Vol.1, No.2(2012), pp.69-78.
  19. Mukherjee, A. and J. McGinnis, "E-healthcare : an analysis of key themes in research," International Journal of Pharmaceutical and Healthcare Marketing, Vol.1, No.4(2007), pp. 349-363. https://doi.org/10.1108/17506120710840170
  20. Muliere, P. and S. Walker, "A Bayesian nonparametric approach to survival analysis using polya trees," Scandinavian Journal of Statistics, Vol.24(1997), pp.331-340. https://doi.org/10.1111/1467-9469.00067
  21. Neath, A.A., "Polya tree distributions for statistical modeling of censored data," Journal of Applied Mathematics and Decision Sciences, Vol.7, No.3(2003), pp.175-186. https://doi.org/10.1155/S1173912603000166
  22. Oakes, D., "Biometrika Centenary : Survival analysis," Biometrika, Vol.88, No.1(2001), pp. 99-142. https://doi.org/10.1093/biomet/88.1.99
  23. Prentice, R. and Cai, J., "Covariance and survival function estimation using censored multivariate failure time data," Biometrika, Vol. 79(1992), pp.495-512. https://doi.org/10.1093/biomet/79.3.495
  24. Rigobon, R. and T. Stoker, "Estimation with censored regressors : basic issues," Interna tional Economic Review, Vol.48, No.4(2007), pp.1441-1467. https://doi.org/10.1111/j.1468-2354.2007.00470.x
  25. Schemper, M., A. Kaider, S. Wakounig, and G. Heinze, "Estimating the correlation of bivariate failure times under censoring," Statistics in Medicine, Vol.32, No.27(2012), pp. 4781-4790.
  26. Seok, J., L. Tian, and W.H. Wong, "Density estimation on multivariate censored data with optional Pólya tree", Biostatistics, Vol.15, No.1(2014), pp.182-95. https://doi.org/10.1093/biostatistics/kxt025
  27. Stajduhar, I. and B. Dalbelo-Basic, "Learning Bayesian networks from survival data using weighting censored instances," Journal of Biomedical Informatics, Vol.43, No.4(2010), pp.613-622. https://doi.org/10.1016/j.jbi.2010.03.005
  28. Turnbull, B., "The empirical distribution function with arbitrary grouped censored and truncated data," Journal of the Royal Statistical Society Series B, Vol.38(1976), pp.290-295.
  29. Wong, W.H. and L. Ma, "Optional Polya Tree and Bayesian Inference," Annals of Statistics, Vol.38(2010), pp.1433-1459. https://doi.org/10.1214/09-AOS755
  30. Zhou, X., J. Menche, A. Barabasi, and A. Sharma, "Human symptoms-disease network," Nature Communications, Vol.5, No. 4212(2014).