• Title/Summary/Keyword: overdispersed data

Search Result 14, Processing Time 0.02 seconds

Estimating Parameters in Overdispersed Binary Data

  • Lee, Sunho
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.1
    • /
    • pp.269-276
    • /
    • 2000
  • there are several methods available for estimating parameters in overdispersed binary response data with the litter effect. Simulations are performed to compare methods for estimating an overall mean and an overdispersion parameter using moments a maximum likelihood under a beta-binomial distribution a maximum quasi-likelihood and a maximum extended quasi-likelihood.

  • PDF

A mixed-effects model for overdispersed binomial data (초과변동의 이항자료에 대한 혼합효과 모형)

  • Choi, Jae-Sung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.10 no.1
    • /
    • pp.199-205
    • /
    • 1999
  • This paper discusses the generalized mixed-effects model for the analysis of overdispersed binomial data. Sometimes certain types of sampling designs or genetic characters of experimental units can be regarded as factors of extra binomial variation. For such cases, this paper suggests models with one or two random effects to explain overdispersion caused by those affecting factors and shows how to test for a model adequacy based on deviance.

  • PDF

A new sample selection model for overdispersed count data (과대산포 가산자료의 새로운 표본선택모형)

  • Jo, Sung Eun;Zhao, Jun;Kim, Hyoung-Moon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.733-749
    • /
    • 2018
  • Sample selection arises as a result of the partial observability of the outcome of interest in a study. Heckman introduced a sample selection model to analyze such data and proposed a full maximum likelihood estimation method under the assumption of normality. Recently sample selection models for binomial and Poisson response variables have been proposed. Based on the theory of symmetry-modulated distribution, we extend these to a model for overdispersed count data. This type of data with no sample selection is often modeled using negative binomial distribution. Hence we propose a sample selection model for overdispersed count data using the negative binomial distribution. A real data application is employed. Simulation studies reveal that our estimation method based on profile log-likelihood is stable.

Modelling Count Responses with Overdispersion

  • Jeong, Kwang Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.761-770
    • /
    • 2012
  • We frequently encounter outcomes of count that have extra variation. This paper considers several alternative models for overdispersed count responses such as a quasi-Poisson model, zero-inflated Poisson model and a negative binomial model with a special focus on a generalized linear mixed model. We also explain various goodness-of-fit criteria by discussing their appropriateness of applicability and cautions on misuses according to the patterns of response categories. The overdispersion models for counts data have been explained through two examples with different response patterns.

A Ppoisson Regression Aanlysis of Physician Visits (외래이용빈도 분석의 모형과 기법)

  • 이영조;한달선;배상수
    • Health Policy and Management
    • /
    • v.3 no.2
    • /
    • pp.159-176
    • /
    • 1993
  • The utilization of outpatient care services involves two steps of sequential decisions. The first step decision is about whether to initiate the utilization and the second one is about how many more visits to make after the initiation. Presumably, the initiation decision is largely made by the patient and his or her family, while the number of additional visits is decided under a strong influence of the physician. Implication is that the analysis of the outpatient care utilization requires to specify each of the two decisions underlying the utilization as a distinct stochastic process. This paper is concerned with the number of physician visits, which is, by definition, a discrete variable that can take only non-negative integer values. Since the initial visit is considered in the analysis of whether or not having made any physician visit, the focus on the number of visits made in addition to the initial one must be enough. The number of additional visits, being a kind of count data, could be assumed to exhibit a Poisson distribution. However, it is likely that the distribution is over dispersed since the number of physician visits tends to cluster around a few values but still vary widely. A recently reported study of outpatient care utilization employed an analysis based upon the assumption of a negative binomial distribution which is a type of overdispersed Poisson distribution. But there is an indication that the use of Poisson distribution making adjustments for over-dispersion results in less loss of efficiency in parameter estimation compared to the use of a certain type of distribution like a negative binomial distribution. An analysis of the data for outpatient care utilization was performed focusing on an assessment of appropriateness of available techniques. The data used in the analysis were collected by a community survey in Hwachon Gun, Kangwon Do in 1990. It was observed that a Poisson regression with adjustments for over-dispersion is superior to either an ordinary regression or a Poisson regression without adjustments oor over-dispersion. In conclusion, it seems the most approprite to assume that the number of physician visits made in addition to the initial visist exhibits an overdispersed Poisson distribution when outpatient care utilization is studied based upon a model which embodies the two-part character of the decision process uderlying the utilization.

  • PDF

Score Tests for Overdispersion

  • Kim, Choong-Rak;Jeong, Mee-Seon;Yang, Mee-Yeong
    • Journal of the Korean Statistical Society
    • /
    • v.23 no.1
    • /
    • pp.207-216
    • /
    • 1994
  • Count data are often overdispersed, and an appropriate test for the existence of the overdispersion is necessary. In this paper we derive a score test based on the extended quasi-likelihood and the pseudolikelihood after adjusting to the Bartlett factor. Also, we compare it with Levene (1960)'s F-type test suggested by Ganio and Schafer (1992).

  • PDF

Sample size calculations for clustered count data based on zero-inflated discrete Weibull regression models

  • Hanna Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.1
    • /
    • pp.55-64
    • /
    • 2024
  • In this study, we consider the sample size determination problem for clustered count data with many zeros. In general, zero-inflated Poisson and binomial models are commonly used for zero-inflated data; however, in real data the assumptions that should be satisfied when using each model might be violated. We calculate the required sample size based on a discrete Weibull regression model that can handle both underdispersed and overdispersed data types. We use the Monte Carlo simulation to compute the required sample size. With our proposed method, a unified model with a low failure risk can be used to cope with the dispersed data type and handle data with many zeros, which appear in groups or clusters sharing a common variation source. A simulation study shows that our proposed method provides accurate results, revealing that the sample size is affected by the distribution skewness, covariance structure of covariates, and amount of zeros. We apply our method to the pancreas disorder length of the stay data collected from Western Australia.

Effects of Overdispersion on Testing for Serial Dependence in the Time Series of Counts Data

  • Kim, Hee-Young;Park, You-Sung
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.6
    • /
    • pp.829-843
    • /
    • 2010
  • To test for the serial dependence in time series of counts data, Jung and Tremayne (2003) evaluated the size and power of several tests under the class of INARMA models based on binomial thinning operations for Poisson marginal distributions. The overdispersion phenomenon(i.e., a variance greater than the expectation) is common in the real world. Overdispersed count data can be modeled by using alternative thinning operations such as random coefficient thinning, iterated thinning, and quasi-binomial thinning. Such thinning operations can lead to time series models of counts with negative binomial or generalized Poisson marginal distributions. This paper examines whether the test statistics used by Jung and Tremayne (2003) on serial dependence in time series of counts data are affected by overdispersion.

Negative Binomial Varying Coefficient Partially Linear Models

  • Kim, Young-Ju
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.809-817
    • /
    • 2012
  • We propose a semiparametric inference for a generalized varying coefficient partially linear model(VCPLM) for negative binomial data. The VCPLM is useful to model real data in that varying coefficients are a special type of interaction between explanatory variables and partially linear models fit both parametric and nonparametric terms. The negative binomial distribution often arise in modelling count data which usually are overdispersed. The varying coefficient function estimators and regression parameters in generalized VCPLM are obtained by formulating a penalized likelihood through smoothing splines for negative binomial data when the shape parameter is known. The performance of the proposed method is then evaluated by simulations.

A Study for Recent Development of Generalized Linear Mixed Model (일반화된 선형 혼합 모형(GENERALIZED LINEAR MIXED MODEL: GLMM)에 관한 최근의 연구 동향)

  • 이준영
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.541-562
    • /
    • 2000
  • The generalized linear mixed model framework is for handling count-type categorical data as well as for clustered or overdispersed non-Gaussian data, or for non-linear model data. In this study, we review its general formulation and estimation methods, based on quasi-likelihood and Monte-Carlo techniques. The current research areas and topics for further development are also mentioned.

  • PDF