• Title/Summary/Keyword: Zero-Inflated Negative Binomial Model

Search Result 37, Processing Time 0.027 seconds

Prediction of the Number of Food Poisoning Occurrences by Microbes (원인균별 식중독 발생 건수 예측)

  • Yeo, In-Kwon
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.6
    • /
    • pp.923-932
    • /
    • 2013
  • This paper proposes a method to predict the number of foodborne disease outbreaks by microbes. The weekly data of food poisoning occurrences by microbes in Korea contain many zero-valued observations and have dependency between outbreaks. In order to model both phenomena, the number of food poisonings is predicted by an autoregressive model and the probabilities of food poisoning occurrences by microbes (given the total of food poisonings) are estimated by the baseline category logit model. The predicted number of foodborne disease outbreaks by a microbe is obtained by multiplying the predicted number of foodborne disease outbreaks and the estimated probability of the food poisoning by the corresponding microbe. The mean squared error and the mean absolute value error are evaluated to compare the performances of the proposed method and the zero-inflated model.

Soccer goal distributions in K-league (K-리그에서 축구 골의 분포)

  • Lee, Jang Taek
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1231-1239
    • /
    • 2014
  • In this paper we analyse the distributions of the number of goals scored by home teams and away teams in K-league soccer outcomes between 1983 and 2012. Real soccer data is explained in K-league using statistical distributions such that Poisson, negative binomial, extreme value and zero inflated Poisson. How close the goals of home and away fits the different distributions are tested by performing chi-square goodness of fit tests. According to these tests, the Poisson distribution gives the best fit to the home goals data. But it is best to model the away goals data on zero inflated Poisson distribution. Also, there is some weak evidence of the dependence for home and away goals.

Estimating Travel Frequency of Public Bikes in Seoul Considering Intermediate Stops (경유지를 고려한 서울시 공공자전거 통행발생량 추정 모형 개발)

  • Jonghan Park;Joonho Ko
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.22 no.3
    • /
    • pp.1-19
    • /
    • 2023
  • Bikes have recently emerged as an alternative to carbon neutrality. To understand the demand for public bikes, we endeavored to estimate travel frequency of public bike by considering the intermediate stops. Using the GPS trajectory data of 'Ttareungyi', a public bike service in Seoul, we identified a stay point and estimated travel frequency reflecting population, land use, and physical characteristics. Application of map matching and a stay point detection algorithm revealed that stay point appeared in about 12.1% of the total trips. Compared to a trip without stay point, the trip with stay point has a longer average travel distance and travel time and a higher occurrence rate during off-peak hours. According to visualization analysis, the stay points are mainly found in parks, leisure facilities, and business facilities. To consider the stay point, the unit of analysis was set as a hexagonal grid rather than the existing rental station base. Travel frequency considering the stay point were analyzed using the Zero-Inflated Negative Binomial (ZINB) model. Results of our analysis revealed that the travel frequency were higher in bike infrastructure where the safety of bike users was secured, such as 'Bikepath' and 'Bike and pedestrian path'. Also, public bikes play a role as first & last mile means of access to public transportation. The measure of travel frequency was also observed to increase in life and employment centers. Considering the results of this analysis, securing safety facilities and space for users should be given priority when planning any additional expansion of bike infrastructure. Moreover, there is a necessity to establish a plan to supply bike infrastructure facilities linked to public transportation, especially the subway.

An Analysis of Spatial Determinants of Inventor Networks in Korea (발명자 네트워크의 공간적 결정요인 분석)

  • Jeong, Jun Ho
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.19 no.1
    • /
    • pp.1-17
    • /
    • 2016
  • This paper attempts to explore the spatial structure of inventor networks and their determinants among 230 shi-gun-gu regions in Korea by investigating the residence of co-inventors engaged in Korean patent applications to the Korean Intellectual Office and exploiting a zero inflated negative binomial model to accommodate an estimation to the count nature of a dependent variable and its excess of zeros. Several variables are found to affect the spatial linkage of inventor networks. Spatial links extend beyond the region if it has more own R&D-related specific assets (private R&D, patent productivity, population, education); if it is physically close to and has technological similarity with the other region. The assets of the other region plays a positive role if, in a similar way, the other region has more R&D-related specific assets.

  • PDF

Modeling clustered count data with discrete weibull regression model

  • Yoo, Hanna
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.413-420
    • /
    • 2022
  • In this study we adapt discrete weibull regression model for clustered count data. Discrete weibull regression model has an attractive feature that it can handle both under and over dispersion data. We analyzed the eighth Korean National Health and Nutrition Examination Survey (KNHANES VIII) from 2019 to assess the factors influencing the 1 month outpatient stay in 17 different regions. We compared the results using clustered discrete Weibull regression model with those of Poisson, negative binomial, generalized Poisson and Conway-maxwell Poisson regression models, which are widely used in count data analyses. The results show that the clustered discrete Weibull regression model using random intercept model gives the best fit. Simulation study is also held to investigate the performance of the clustered discrete weibull model under various dispersion setting and zero inflated probabilities. In this paper it is shown that using a random effect with discrete Weibull regression can flexibly model count data with various dispersion without the risk of making wrong assumptions about the data dispersion.

Threshold-asymmetric volatility models for integer-valued time series

  • Kim, Deok Ryun;Yoon, Jae Eun;Hwang, Sun Young
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.295-304
    • /
    • 2019
  • This article deals with threshold-asymmetric volatility models for over-dispersed and zero-inflated time series of count data. We introduce various threshold integer-valued autoregressive conditional heteroscedasticity (ARCH) models as incorporating over-dispersion and zero-inflation via conditional Poisson and negative binomial distributions. EM-algorithm is used to estimate parameters. The cholera data from Kolkata in India from 2006 to 2011 is analyzed as a real application. In order to construct the threshold-variable, both local constant mean which is time-varying and grand mean are adopted. It is noted via a data application that threshold model as an asymmetric version is useful in modelling count time series volatility.

Predictors of Blood and Body Fluid Exposure and Mediating Effects of Infection Prevention Behavior in Shift-Working Nurses: Application of Analysis Method for Zero-Inflated Count Data (교대근무 간호사의 혈액과 체액 노출 사고 예측 요인과 감염예방행위의 매개효과: 영과잉 가산 자료 분석방법을 적용하여)

  • Ryu, Jae Geum;Choi-Kwon, Smi
    • Journal of Korean Academy of Nursing
    • /
    • v.50 no.5
    • /
    • pp.658-670
    • /
    • 2020
  • Purpose: This study aimed to identify the predictors of blood and body fluid exposure (BBFE) in multifaceted individual (sleep disturbance and fatigue), occupational (occupational stress), and organizational (hospital safety climate) factors, as well as infection prevention behavior. We also aimed to test the mediating effect of infection prevention behavior in relation to multifaceted factors and the frequency of BBFE. Methods: This study was based on a secondary data analysis, using data of 246 nurses from the Shift Work Nurses' Health and Turnover study. Based on the characteristics of zero-inflated and over-dispersed count data of frequencies of BBFE, the data were analyzed to calculate zero-inflated negative binomial regression within a generalized linear model and to test the mediating effect using SPSS 25.0, Stata 14.1, and PROCESS macro. Results: We found that the frequency of BBFE increased in subjects with disturbed sleep (IRR = 1.87, p = .049), and the probability of non-BBFE increased in subjects showing higher infection prevention behavior (IRR = 15.05, p = .006) and a hospital safety climate (IRR = 28.46, p = .018). We also found that infection prevention behavior had mediating effects on the occupational stress-BBFE and hospital safety climate-BBFE relationships. Conclusion: Sleep disturbance is an important risk factor related to frequency of BBFE, whereas preventive factors are infection prevention behavior and hospital safety climate. We suggest individual and systemic efforts to improve sleep, occupational stress, and hospital safety climate to prevent BBFE occurrence.

Motorcycle Accident Model at Roundabout in Korea using ZAM (ZAM을 이용한 국내 회전교차로 오토바이 사고모형)

  • Park, Byung Ho;Lim, Jin Kang;Na, Hee
    • Journal of the Korean Society of Safety
    • /
    • v.29 no.3
    • /
    • pp.107-113
    • /
    • 2014
  • The goal of this study is to develop the accident models of motorcycle at roundabouts. In the pursuing the above, this study gives particular attentions to developing the appropriate models using ZAM. The main results are as follows. First, the evaluation of various developed models by the Vuong statistic and over-dispersion parameter shows that ZINB is analyzed to be optimal among Poisson, NB, ZIP(zero-inflated Poisson) and ZINB regression models. Second, the traffic volume, width of central island and width of approach are evaluated to be important variables to the accidents. Finally, the common variables that affect to the accident are selected to be traffic volume and width of approach. This study might be expected to give some implications to the accident research on the roundabout by motorcycle.

Analysis of Elderly Drivers' Accident Models Considering Operations and Physical Characteristics (고령운전자 운전 및 신체특성을 반영한 교통사고 분석 연구)

  • Lim, Sam Jin;Park, Jun Tae;Kim, Young Il;Kim, Tae Ho
    • Journal of Korean Society of Transportation
    • /
    • v.30 no.6
    • /
    • pp.37-46
    • /
    • 2012
  • The number of traffic accidents caused by elderly drivers over the age of 65 has surged over the past ten years from 37,000 to 274,000 cases. The proportion of elderly drivers' accidents has jumped 3.1 times from 1.2% to 3.7% out of all traffic accidents, and traffic safety organizations are pursuing diverse measures to address the situation. Above all, connecting safety measures with an in-depth research on behavioral and physical characteristics of elderly drivers will prove vital. This study conducted an empirical research linking the driving characteristics and traffic accidents by elderly drivers based on the Driving Aptitude Test items and traffic accident data, which enabled the measurement of behavioral characteristics of elderly drivers. In developing the Influence Model, we applied the zero-inflated Poisson (ZIP) regression model and selected an accident prediction model based on the Bayesian Influence in regards to the ZIP regression model and the zero-inflated negative binomial (ZINB) regression model. According to the results of the AAE analysis, the ZIP regression model was more appropriate and it was found that three variables? prediction of velocity, diversion, and cognitive ability? had a relation of influence with traffic accidents caused by elderly drivers.

Application of discrete Weibull regression model with multiple imputation

  • Yoo, Hanna
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.325-336
    • /
    • 2019
  • In this article we extend the discrete Weibull regression model in the presence of missing data. Discrete Weibull regression models can be adapted to various type of dispersion data however, it is not widely used. Recently Yoo (Journal of the Korean Data and Information Science Society, 30, 11-22, 2019) adapted the discrete Weibull regression model using single imputation. We extend their studies by using multiple imputation also with several various settings and compare the results. The purpose of this study is to address the merit of using multiple imputation in the presence of missing data in discrete count data. We analyzed the seventh Korean National Health and Nutrition Examination Survey (KNHANES VII), from 2016 to assess the factors influencing the variable, 1 month hospital stay, and we compared the results using discrete Weibull regression model with those of Poisson, negative Binomial and zero-inflated Poisson regression models, which are widely used in count data analyses. The results showed that the discrete Weibull regression model using multiple imputation provided the best fit. We also performed simulation studies to show the accuracy of the discrete Weibull regression using multiple imputation given both under- and over-dispersed distribution, as well as varying missing rates and sample size. Sensitivity analysis showed the influence of mis-specification and the robustness of the discrete Weibull model. Using imputation with discrete Weibull regression to analyze discrete data will increase explanatory power and is widely applicable to various types of dispersion data with a unified model.