• Title/Summary/Keyword: Count data Model

Search Result 232, Processing Time 0.023 seconds

Negative binomial loglinear mixed models with general random effects covariance matrix

  • Sung, Youkyung;Lee, Keunbaik
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.1
    • /
    • pp.61-70
    • /
    • 2018
  • Modeling of the random effects covariance matrix in generalized linear mixed models (GLMMs) is an issue in analysis of longitudinal categorical data because the covariance matrix can be high-dimensional and its estimate must satisfy positive-definiteness. To satisfy these constraints, we consider the autoregressive and moving average Cholesky decomposition (ARMACD) to model the covariance matrix. The ARMACD creates a more flexible decomposition of the covariance matrix that provides generalized autoregressive parameters, generalized moving average parameters, and innovation variances. In this paper, we analyze longitudinal count data with overdispersion using GLMMs. We propose negative binomial loglinear mixed models to analyze longitudinal count data and we also present modeling of the random effects covariance matrix using the ARMACD. Epilepsy data are analyzed using our proposed model.

A Bayesian zero-inflated negative binomial regression model based on Pólya-Gamma latent variables with an application to pharmaceutical data (폴랴-감마 잠재변수에 기반한 베이지안 영과잉 음이항 회귀모형: 약학 자료에의 응용)

  • Seo, Gi Tae;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.311-325
    • /
    • 2022
  • For count responses, the situation of excess zeros often occurs in various research fields. Zero-inflated model is a common choice for modeling such count data. Bayesian inference for the zero-inflated model has long been recognized as a hard problem because the form of conditional posterior distribution is not in closed form. Recently, however, Pillow and Scott (2012) and Polson et al. (2013) proposed a Pólya-Gamma data-augmentation strategy for logistic and negative binomial models, facilitating Bayesian inference for the zero-inflated model. We apply Bayesian zero-inflated negative binomial regression model to longitudinal pharmaceutical data which have been previously analyzed by Min and Agresti (2005). To facilitate posterior sampling for longitudinal zero-inflated model, we use the Pólya-Gamma data-augmentation strategy.

Bayesian Analysis of a Zero-inflated Poisson Regression Model: An Application to Korean Oral Hygienic Data (영과잉 포아송 회귀모형에 대한 베이지안 추론: 구강위생 자료에의 적용)

  • Lim, Ah-Kyoung;Oh, Man-Suk
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.3
    • /
    • pp.505-519
    • /
    • 2006
  • We consider zero-inflated count data, which is discrete count data but has too many zeroes compared to the Poisson distribution. Zero-inflated data can be found in various areas. Despite its increasing importance in practice, appropriate statistical inference on zero-inflated data is limited. Classical inference based on a large number theory does not fit unless the sample size is very large. And regular Poisson model shows lack of St due to many zeroes. To handle the difficulties, a mixture of distributions are considered for the zero-inflated data. Specifically, a mixture of a point mass at zero and a Poisson distribution is employed for the data. In addition, when there exist meaningful covariates selected to the response variable, loglinear link is used between the mean of the response and the covariates in the Poisson distribution part. We propose a Bayesian inference for the zero-inflated Poisson regression model by using a Markov Chain Monte Carlo method. We applied the proposed method to a Korean oral hygienic data and compared the inference results with other models. We found that the proposed method is superior in that it gives small parameter estimation error and more accurate predictions.

Analysis of Marginal Count Failure Data by using Covariates

  • Karim, Md.Rezaul;Suzuki, Kazuyuki
    • International Journal of Reliability and Applications
    • /
    • v.4 no.2
    • /
    • pp.79-95
    • /
    • 2003
  • Manufacturers collect and analyze field reliability data to enhance the quality and reliability of their products and to improve customer satisfaction. To reduce the data collecting and maintenance costs, the amount of data maintained for evaluating product quality and reliability should be minimized. With this in mind, some industrial companies assemble warranty databases by gathering data from different sources for a particular time period. This “marginal count failure data” does not provide (i) the number of failures by when the product entered service, (ii) the number of failures by product age, or (iii) information about the effects of the operating season or environment. This article describes a method for estimating age-based claim rates from marginal count failure data. It uses covariates to identify variations in claims relative to variables such as manufacturing characteristics, time of manufacture, operating season or environment. A Poisson model is presented, and the method is illustrated using warranty claims data for two electrical products.

  • PDF

Overdispersion in count data - a review (가산자료(count data)의 과산포 검색: 일반화 과정)

  • 김병수;오경주;박철용
    • The Korean Journal of Applied Statistics
    • /
    • v.8 no.2
    • /
    • pp.147-161
    • /
    • 1995
  • The primary objective of this paper is to review parametric models and test statistics related to overdspersion of count data. Poisson or binomial assumption often fails to explain overdispersion. We reviewed real examples of overdispersion in count data that occurred in toxicological or teratological experiments. We also reviewed several models that were suggested for implementing experiments. We also reviewed several models that were suggested for implementing the extra-binomial variation or hyper-Poisson variability, and we noted how these models were generalized and further developed. The approaches that have been suggested for the overdispersion fall into two broad categories. The one is to develop a parametric model for it, and the other is to assume a particular relationship between the variance and the mean of the response variable and to derive a score test staistics for detecting the overdispersion. Recently, Dean(1992) derived a general score test statistics for detecting overdispersion from the exponential family.

  • PDF

Analysis of Traffic Accident by Circular Intersection Type in Korea Using Count Data Model (가산자료 모형을 이용한 국내 원형교차로 유형별 교통사고 분석)

  • Kim, Tae Yang;Lee, Min Yeong;Park, Byung Ho
    • Journal of the Korean Society of Safety
    • /
    • v.32 no.5
    • /
    • pp.129-134
    • /
    • 2017
  • This study aims to develop the traffic accident models by circular intersection type using count data model. The number of accident, the number of fatal and injured persons(FSI), and EPDO are calculated from the traffic accident data of TAAS. The circular intersection accident models are developed through Poisson and negative binomial regression analysis. The main results of this study are as follows. First, the null hypotheses that there are differences in the number of traffic accidents, FSI and EPDO by type of circular intersections are rejected. Second, the scale of intersection(median, large), number of approach road, mean width and length of exit road, area of the circulating roadway and central island are selected as factors influencing the number of traffic accidents, FSI and EPDO in rotary. Third, the scale of intersection(median), guide signs(limited speed, direction, roundabout), number of approach road, entry angle, area of the intersection and central island are adopted as factors influencing the number of traffic accidents, FSI and EPDO in roundabout. Finally, transferring from rotary to roundabout could be expected to make the accident decrease.

A new sample selection model for overdispersed count data (과대산포 가산자료의 새로운 표본선택모형)

  • Jo, Sung Eun;Zhao, Jun;Kim, Hyoung-Moon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.733-749
    • /
    • 2018
  • Sample selection arises as a result of the partial observability of the outcome of interest in a study. Heckman introduced a sample selection model to analyze such data and proposed a full maximum likelihood estimation method under the assumption of normality. Recently sample selection models for binomial and Poisson response variables have been proposed. Based on the theory of symmetry-modulated distribution, we extend these to a model for overdispersed count data. This type of data with no sample selection is often modeled using negative binomial distribution. Hence we propose a sample selection model for overdispersed count data using the negative binomial distribution. A real data application is employed. Simulation studies reveal that our estimation method based on profile log-likelihood is stable.

A Study on Phon Call Big Data Analytics (전화통화 빅데이터 분석에 관한 연구)

  • Kim, Jeongrae;Jeong, Chanki
    • Journal of Information Technology and Architecture
    • /
    • v.10 no.3
    • /
    • pp.387-397
    • /
    • 2013
  • This paper proposes an approach to big data analytics for phon call data. The analytical models for phon call data is composed of the PVPF (Parallel Variable-length Phrase Finding) algorithm for identifying verbal phrases of natural language and the word count algorithm for measuring the usage frequency of keywords. In the proposed model, we identify words using the PVPF algorithm, and measure the usage frequency of the identified words using word count algorithm in MapReduce. The results can be interpreted from various viewpoints. We design and implement the model based HDFS (Hadoop Distributed File System), verify the proposed approach through a case study of phon call data. So we extract useful results through analysis of keyword correlation and usage frequency.

Likelihood-Based Inference on Genetic Variance Component with a Hierarchical Poisson Generalized Linear Mixed Model

  • Lee, C.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.13 no.8
    • /
    • pp.1035-1039
    • /
    • 2000
  • This study developed a Poisson generalized linear mixed model and a procedure to estimate genetic parameters for count traits. The method derived from a frequentist perspective was based on hierarchical likelihood, and the maximum adjusted profile hierarchical likelihood was employed to estimate dispersion parameters of genetic random effects. Current approach is a generalization of Henderson's method to non-normal data, and was applied to simulated data. Underestimation was observed in the genetic variance component estimates for the data simulated with large heritability by using the Poisson generalized linear mixed model and the corresponding maximum adjusted profile hierarchical likelihood. However, the current method fitted the data generated with small heritability better than those generated with large heritability.

A Comparative Study on Estimation Models for the Value of Access to a Natural Recreation Site: Focusing on the Estuary Area of Yeongsan River (자연휴양지 방문편익 추정모형의 비교 연구 - 영산강 하구를 대상으로)

  • Shin, Youngchul
    • Environmental and Resource Economics Review
    • /
    • v.21 no.4
    • /
    • pp.981-998
    • /
    • 2012
  • In this paper, several count data model of travel cost recreation demand with Poisson and negative binominal specification are applied to estimate the value of access to the estuary area of Yeongsan river from visitor survey data. The results show that the negative binomial model that accounts for truncation and overdispersion provides the better goodness-of-fit, and therefore the value per visit(i.e. consumer surplus) is 89,350 won for resident of Jeolla province and 432,526 won for that of other provinces. If don't correct overdispersion by relying on Poisson estimates, the consumer surplus will be underestimated. Whereas the consumer surplus will be overestimated unless correct truncation by using estimates of untruncated models. As a result, the truncated negative binomial model should be applied to estimate the travel demand and the consumer surplus per visit by using survey data from single site visitors.

  • PDF