DOI QR코드

DOI QR Code

Sample size calculations for clustered count data based on zero-inflated discrete Weibull regression models

  • Hanna Yoo (Department of Applied Statistics, Hanshin University)
  • 투고 : 2023.08.06
  • 심사 : 2023.10.18
  • 발행 : 2024.01.31

초록

In this study, we consider the sample size determination problem for clustered count data with many zeros. In general, zero-inflated Poisson and binomial models are commonly used for zero-inflated data; however, in real data the assumptions that should be satisfied when using each model might be violated. We calculate the required sample size based on a discrete Weibull regression model that can handle both underdispersed and overdispersed data types. We use the Monte Carlo simulation to compute the required sample size. With our proposed method, a unified model with a low failure risk can be used to cope with the dispersed data type and handle data with many zeros, which appear in groups or clusters sharing a common variation source. A simulation study shows that our proposed method provides accurate results, revealing that the sample size is affected by the distribution skewness, covariance structure of covariates, and amount of zeros. We apply our method to the pancreas disorder length of the stay data collected from Western Australia.

키워드

과제정보

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2021R1F1A1053119).

참고문헌

  1. Channouf N, Fredette M, and MacGibbon B (2014). Power and sample size calculations for poisson and zero inflated poisson regression models, Computational Statistics & Data Analysis, 72, 241-251. https://doi.org/10.1016/j.csda.2013.09.029
  2. Channouf N, Fredette M, and MacGibbon B (2021). Sample size calculations for hierarchical Poisson and zero-inflated Poisson regression models, Communications in Statistics - Simulation and Computation, 50, 937-956. https://doi.org/10.1080/03610918.2019.1577975
  3. Choo-Wosoba H, Gaskins J, Levy S, and Datta S (2018). A Bayesian approach for analyzing zeroinflated clustered count data with dispersion, Statistics in Medicine, 37, 801-812. https://doi.org/10.1002/sim.7541
  4. Ha ID, Lee Y, and Song JK (2001). Hierarchical likelihood approach for frailty models, Biometrika, 88, 233-243. https://doi.org/10.1093/biomet/88.1.233
  5. Hall DB (2000). Zero-inflated Poisson and binomial regression with random effects: A case study, Biometrics, 56, 1030-1039. https://doi.org/10.1111/j.0006-341X.2000.01030.x
  6. Jin S and Lee Y (2020). A review of h-likelihood and hierarchical generalized linear model, WIREs Computational Statistics, 13, e1527.
  7. Lee Y and Nelder JA (1996). Hierarchical generalized linear models, Journal of the Royal Statistical Society: Series B (Methodological), 58, 619-678. https://doi.org/10.1111/j.2517-6161.1996.tb02105.x
  8. Lee M, Ha ID, and Lee Y (2017). Frailty modeling for clustered competing risks data with missing cause of failure, Statistical Methods in Medical Research, 26, 356-373. https://doi.org/10.1177/0962280214545639
  9. Nakagawa T and Osaki S (1975). The discrete Weibull distribution, IEEE Transactions on Reliability, 24, 300-301. https://doi.org/10.1109/TR.1975.5214915
  10. Perumean-Chaney SE, Morgan C, McDowall D, and Aban I (2013). Zero-inflated and overdispersed: What's one to do?, Journal of Statistical Computation and Simulation, 83, 1671-1683. https://doi.org/10.1080/00949655.2012.668550
  11. Shieh G (2001). Sample size calculations for logistic and Poisson regression models, Biometrika, 88, 1193-1199. https://doi.org/10.1093/biomet/88.4.1193
  12. Tapak L, Hamidi O, Amini P, and Verbeke G (2019). Random effect exponentiated-exponential geometric model for clustered/longitudinal zero-inflated count data, Journal of Applied Statistics, 47, 2272-2288. https://doi.org/10.1080/02664763.2019.1706726
  13. Yau KWK, Wang K, and Lee AH (2003). Zero-inflated negative binomial mixed regression modeling of over-dispersed count data with extra zeros, Biometrical Journal, 45, 437-452. https://doi.org/10.1002/bimj.200390024
  14. Yoo H (2023). Sample size for clustered count data based on discrete Weibull regression model, Communications in Statistics - Simulation and Computation, 52, 5850-5856. https://doi.org/10.1080/03610918.2021.2001530