DOI QR코드

DOI QR Code

A joint modeling of longitudinal zero-inflated count data and time to event data

경시적 영과잉 가산자료와 생존자료의 결합모형

  • Kim, Donguk (Department of Statistics, Sungkyunkwan University) ;
  • Chun, Jihun (Department of Statistics, Sungkyunkwan University)
  • Received : 2016.11.28
  • Accepted : 2016.12.15
  • Published : 2016.12.31

Abstract

Both longitudinal data and survival data are collected simultaneously in longitudinal data which are observed throughout the passage of time. In this case, the effect of the independent variable becomes biased (provided that sole use of longitudinal data analysis does not consider the relation between both data used) if the missing that occurred in the longitudinal data is non-ignorable because it is caused by a correlation with the survival data. A joint model of longitudinal data and survival data was studied as a solution for such problem in order to obtain an unbiased result by considering the survival model for the cause of missing. In this paper, a joint model of the longitudinal zero-inflated count data and survival data is studied by replacing the longitudinal part with zero-inflated count data. A hurdle model and proportional hazards model were used for each longitudinal zero inflated count data and survival data; in addition, both sub-models were linked based on the assumption that the random effect of sub-models follow the multivariate normal distribution. We used the EM algorithm for the maximum likelihood estimator of parameters and estimated standard errors of parameters were calculated using the profile likelihood method. In simulation, we observed a better performance of the joint model in bias and coverage probability compared to the separate model.

시간의 흐름에 따라 관측되는 경시적(longitudinal) 자료의 경우, 경시적 자료와 생존(survival) 자료가 종종 동시에 수집된다. 이 때 경시적 자료에서 발생하는 결측이 생존자료와의 연관성으로 인해 발생한 무시할 수 없는 결측(non-ignorable missing)이라면, 경시적 자료분석 방법만으로는 두 자료 간의 연관성을 고려하지 않아 독립변수에 대한 효과는 편향된 결과를 얻게 된다. 이러한 문제를 해결하기 위해서 결측의 원인이 생존시간과 연관되어 있으므로 생존모형을 고려하여 불편추정량을 얻기 위해 경시적 자료와 생존자료의 결합모형에 대한 연구가 이루어져 왔다. 본 논문은 경시적 자료의 형태가 영이 많이 존재하는 영과잉 가산자료(zero-inflated count data)와 생존자료의 결합모형을 연구하였다. 경시적 영과잉 가산자료와 생존자료는 각각 허들모형(hurdle model)과 비례위험모형(proportional hazards model)의 부 모형을 적용하였고, 두 부 모형들의 변량효과가 다변량 정규분포를 따른다는 가정을 통하여 결합하였다. 모수의 최우추정법으로 EM 알고리즘을 활용하였고, 추정된 표준오차를 계산하기 위해 프로파일 우도(profile likelihood)를 이용하였다. 최종적으로 모의실험을 통해 두 부 모형의 변량효과 간 상관관계가 존재하는 경우 결합모형이 개별적 모형보다 편의와 포함확률(coverage probability)의 측면에서 더 우수함을 보였다.

Keywords

References

  1. Buu, A., Li, R., Tan, X., and Zuker, R. A. (2012). Statistical models for longitudinal zero-inflated count data with applications to the substance abuse field, Statistics in Medicine, 31, 4074-4086. https://doi.org/10.1002/sim.5510
  2. Dempster, A. P., Laird, N. M., and Rubin D. B. (1977). Maximum likelihood from incomplete data via the EM Algorithm, Journal of the Royal Statistical Society Series B (Methodological), 39, 1-38.
  3. Diggle, P. J. and Kenward, M. G. (1994). Informative drop-out in longitudinal data analysis (with discussion), Applied Statistics, 43, 49-93. https://doi.org/10.2307/2986113
  4. Elashoff, R. M., Li, G., and Li, N. (2008). A Joint model for longitudinal measurements and survival data in the presence of multiple failure types, Biometrics, 64, 762-771. https://doi.org/10.1111/j.1541-0420.2007.00952.x
  5. Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: a case study, Biometrics, 56, 1030-1039. https://doi.org/10.1111/j.0006-341X.2000.01030.x
  6. Henderson, R., Diggle, P., and Dobson, A. (2000). Joint modeling of longitudinal measurements and event time data, Biostatistics, 1, 465-480. https://doi.org/10.1093/biostatistics/1.4.465
  7. Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing, Technometrics, 34, 1-14. https://doi.org/10.2307/1269547
  8. Lin, H., McCulloch, C. E., and Rosenheck, R. A. (2004). Latent pattern mixture models for informative intermittent missing data in longitudinal studies, Biometrics, 60, 295-305. https://doi.org/10.1111/j.0006-341X.2004.00173.x
  9. Little, R. J. and Rubin, D. B. (2002). Statistical Analysis with Missing Data(2nd ed.), Wiley, New York.
  10. Liu, Q. and Pierce, D. A. (1994). A note on Gauss-Hermite quadrature, Biometrika, 81, 624-629.
  11. Min, Y. and Agresti, A. (2005). Random effects models for repeated measures of zero-inflated count data, Statistical Modeling, 5, 1-19. https://doi.org/10.1191/1471082X05st084oa
  12. Mullahy, J. (1986). Specification and testing of some modified count data models, Journal of Econometrics, 33, 341-365. https://doi.org/10.1016/0304-4076(86)90002-3
  13. Murphy, S. A. and Vaart, W. (2000). On profile likelihood, Journal of the American Statistical Association, 95, 449-465. https://doi.org/10.1080/01621459.2000.10474219
  14. Prentice, R. L. (1982). Covariate measurement errors and parameter estimation in a failure time regression model, Biometrics, 69, 331-342. https://doi.org/10.1093/biomet/69.2.331
  15. Sousa, I. (2011). A review on joint modeling of longitudinal measurements and time-to-event, Revstat, 9, 57-81.
  16. Tseng, Y., Hsieh, F., and Wang, J. L. (2005). Joint modeling of accelerated failure time and longitudinal data, Biometrika, 92, 587-603. https://doi.org/10.1093/biomet/92.3.587
  17. Wu, L., Liu, W., Yi, G. Y., and Huang, Y. (2012). Analysis of longitudinal and survival data: joint modeling, inference methods, and issues, Journal of Probability and Statistics 2012, Article ID 640153.
  18. Wulfsohn, M. S. and Tsiatis, A. A. (1997). A Joint model for survival and longitudinal data measured with error, Biometrics, 53, 330-339. https://doi.org/10.2307/2533118
  19. Yau, K. K. and Lee, A. H. (2001). Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme, Statistics in Medicine, 20, 2907-2920. https://doi.org/10.1002/sim.860