Heat-Wave Data Analysis based on the Zero-Inflated Regression Models

영-과잉 회귀모형을 활용한 폭염자료분석

  • Kim, Seong Tae (Department of Mathematics, North Carolina A&T State University) ;
  • Park, Man Sik (Department of Statistics, College of Natural Sciences, Sungshin Women's University)
  • Received : 2018.10.17
  • Accepted : 2018.11.23
  • Published : 2018.12.31

Abstract

The random variable with an arbitrary value or more is called semi-continuous variable or zero-inflated one in case that its boundary value is more frequently observed than expected. This means the boundary value is likely to be practically observed more than it should be theoretically under certain probability distribution. When the distribution considered is continuous, the variable is defined as semi-continuous and when one of discrete distribution is assumed for the variable, we regard it as zero-inflated. In this study, we introduce the two-part model, which consists of one part for modelling the binary response and the other part for modelling the variable greater than the boundary value. Especially, the zero-inflated regression models are explained by using Poisson distribution and negative binomial distribution. In real data analysis, we employ the zero-inflated regression models to estimate the number of days under extreme heat-wave circumstances during the last 10 years in South Korea. Based on the estimation results, we create prediction maps for the estimated number of days under heat-wave advisory and heat-wave warning by using the universal kriging, which is one of the spatial prediction methods.

음이 아닌(non-negative) 측정값을 가지는 확률변수에 있어서, 영(0)이 과도하게 측정되는 자료를 반연속형(semi-continuous) 자료와 영-과잉(zero-inflated) 자료로 구분한다. 이러한 자료에서는 특정 확률 분포(probability distribution) 하에서의 확률보다 훨씬 큰 확률로 0을 관측하게 되는데, 연속형(continuous) 확률분포를 고려하는 경우에는 반연속형으로, 이산형(discrete) 확률분포를 고려하는 경우에는 영-과잉이라고 한다. 본 연구에서는 경계값(0)의 측정 여부에 관한 모형과 0보다 큰 확률변수에 대한 확률분포를 활용한 모형 등 두 개의 부문으로 이루어진 모형, 즉 2-부문 모형(two-part model)을 소개하고자 한다. 특히, 이산형 확률분포 중 포아송 분포와 음이항 분포를 고려한 영-과잉 회귀모형(regression model)을 설명하고 그 특성을 파악하고자 한다. 실증연구에서는 이러한 영-과잉 회귀모형을 활용하여 지난 10년(2009년부터 2018년) 간 한국의 여름철(6-8월) 폭염주의보(heat-wave advisory) 및 폭염경보(heat-wave warning) 발생일수를 적합하였다. 또한 공간예측기법 중 하나인 범용크리깅(universal kriging)을 이용하여 적합결과를 바탕으로 한 폭염 발생일수에 대한 예측지도를 작성하였다.

Keywords

References

  1. Aitchison, J. (1955). On the distribution of a positive random variable having a discrete probability mass at the origin, Journal of American Statistical Association, 50, 901-908.
  2. Anderson, B. G., Bell, M. L. (2009). Weather-related mortality: how heat, cold, and heat waves affect mortality in the United States, Epidemiology, 20(2), 205-213. https://doi.org/10.1097/EDE.0b013e318190ee08
  3. Banerjee, S., Gelfand, A. E., Carlin, B. P. (2004). Hierarchical modeling and analysis for spatial data, Boca Raton: Chapman & Hall/CRC.
  4. Cameron, A. C., Trivedi, P. K. (1998). Regression analysis of count data, No.9780521635677, Cambridge Books, Cambridge University Press.
  5. Choi, B.-M., Lee, S.-K. (2011). A study on decision tree for zero-inflated count data, Journal of the Korean Data Analysis Society, 18(5), 2435-2443.
  6. Cragg, J. G. (1971). Some statistical models for limited dependent variables with application to the demand for durable goods, Econometrica, 39(5), 829-844. https://doi.org/10.2307/1909582
  7. Cressie, N. A. C. (1993). Statistics for spatial data, John Wily and Sons, New York.
  8. Dietz, E., Bohning, D. (2000). On estimation of the Poisson parameter in zero-modified Poisson models, Computational Statistics & Data Analysis, 34(4), 441-459. https://doi.org/10.1016/S0167-9473(99)00111-5
  9. Duan, N., Manning, W. G., Morris, C. N., Newhouse, J. P. (1983). A comparison of alternative models for the demand for medical care, Journal of Business and Economic Statistics, 1, 115-126.
  10. Greene, W. (1994). Accounting for excess zeros and sample selection in Poisson and negative binomial regression models, Working Paper EC94-10, Department of Economics, New York University.
  11. Kim, J. Y., Lee, S.-K. (2008). A case study on the credit scoring model with zero-inflated Poisson regression, Journal of the Korean Data Analysis Society, 10(6), 3255-3265.
  12. Kwon, B. Y., Lee, E., Lee, S., Heo, S., Jo, K., Kim, J., Park, M. S. (2015). Vulnerabilities to temperature effects on acute myocardial infarction hospital admissions in South Korea, International Journal of Environmental Research and Public Health, 12(11), 14571-14588. https://doi.org/10.3390/ijerph121114571
  13. Lachenbruch, P. A. (2002). Analysis of data with excess zeros, Statistical Methods in Medical Research, 11, 297-302. https://doi.org/10.1191/0962280202sm289ra
  14. Lambert, D. (1992). Zero-inflated Poisson regression with an application to defects in manufacturing, Technometrics, 34, 1-14. https://doi.org/10.2307/1269547
  15. Lee, S., Lee, E., Park, M. S., Kwon, B. Y., Kim, H. N., Jung, D. H., Jo, K. H., Jung, M. H., Rha, S.-W. (2014). Short-term effect of temperature on daily emergency visits for acute myocardial infarction with threshold temperatures, PLoS ONE, 9(4), 1-9.
  16. Olsen, M. K., Schafer, J. L. (2001). A two-part random-effects model for semicontinuous longitudinal data, Journal of American Statistical Association, 96, 730-745. https://doi.org/10.1198/016214501753168389
  17. Shim, J., Lee, D.-H., Jung, B. C. (2016). Bayesian inference for the zero-inflated Poisson lognormal regression model, Journal of the Korean Data Analysis Society, 18(2), 707-718.
  18. Song, J. (2014). Parameter estimation of zero-inflated Poisson model for incomplete count data, Journal of the Korean Data Analysis Society, 19(2), 689-697.
  19. Tooze, J. A., Grunwald, G. K., Jones, R. H. (2002). Analysis of repeated measures data with clumping at zero, Statistical Methods in Medical Research, 11, 341-355. https://doi.org/10.1191/0962280202sm291ra