• Title/Summary/Keyword: 과대산포 자료

Search Result 16, Processing Time 0.033 seconds

A new sample selection model for overdispersed count data (과대산포 가산자료의 새로운 표본선택모형)

  • Jo, Sung Eun;Zhao, Jun;Kim, Hyoung-Moon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.733-749
    • /
    • 2018
  • Sample selection arises as a result of the partial observability of the outcome of interest in a study. Heckman introduced a sample selection model to analyze such data and proposed a full maximum likelihood estimation method under the assumption of normality. Recently sample selection models for binomial and Poisson response variables have been proposed. Based on the theory of symmetry-modulated distribution, we extend these to a model for overdispersed count data. This type of data with no sample selection is often modeled using negative binomial distribution. Hence we propose a sample selection model for overdispersed count data using the negative binomial distribution. A real data application is employed. Simulation studies reveal that our estimation method based on profile log-likelihood is stable.

A Zero-Inated Model for Insurance Data (제로팽창 모형을 이용한 보험데이터 분석)

  • Choi, Jong-Hoo;Ko, In-Mi;Cheon, Soo-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.3
    • /
    • pp.485-494
    • /
    • 2011
  • When the observations can take only the non-negative integer values, it is called the count data such as the numbers of car accidents, earthquakes, or insurance coverage. In general, the Poisson regression model has been used to model these count data; however, this model has a weakness in that it is restricted by the equality of the mean and the variance. On the other hand, the count data often tend to be too dispersed to allow the use of the Poisson model in practice because the variance of data is significantly larger than its mean due to heterogeneity within groups. When overdispersion is not taken into account, it is expected that the resulting parameter estimates or standard errors will be inefficient. Since coverage is the main issue for insurance, some accidents may not be covered by insurance, and the number covered by insurance may be zero. This paper considers the zero-inflated model for the count data including many zeros. The performance of this model has been investigated by using of real data with overdispersion and many zeros. The results indicate that the Zero-Inflated Negative Binomial Regression Model performs the best for model evaluation.

Similarity between the dispersion parameter in zero-altered model and the two goodness-of-fit statistics (영 변환 모형 산포형태모수와 두 적합도 검정통계량 사이의 유사성 비교)

  • Yun, Yujeong;Kim, Honggie
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.493-504
    • /
    • 2017
  • We often observe count data that exhibit over-dispersion, originating from too many zeros, and under-dispersion, originating from too few zeros. To handle this types of problems, the zero-altered distribution model is designed by Ghosh and Kim in 2007. Their model can control both over-dispersion and under-dispersion with a single parameter, which had been impossible ever. The dispersion type depends on the sign of the parameter ${\delta}$ in zero-altered distribution. In this study, we demonstrate the role of the dispersion type parameter ${\delta}$ through the data of the number of births in Korea. Employing both the chi-square statistic and the Kolmogorov statistic for goodness-of-fit, we also explained any difference between the theoretical distribution and the observed one that exhibits either over-dispersion or under-dispersion. Finally this study shows whether the test statistics for goodness-of-fit show any similarity with the role of the dispersion type parameter ${\delta}$ or not.

Bivariate Zero-Inflated Negative Binomial Regression Model with Heterogeneous Dispersions (서로 다른 산포를 허용하는 이변량 영과잉 음이항 회귀모형)

  • Kim, Dong-Seok;Jeong, Seul-Gi;Lee, Dong-Hee
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.5
    • /
    • pp.571-579
    • /
    • 2011
  • We propose a new bivariate zero-inflated negative binomial regression model to allow heterogeneous dispersions. To show the performance of our proposed model, Health Care data in Deb and Trivedi (1997) are used to compare it with the other bivariate zero-inflated negative binomial model proposed by Wang (2003) that has a common dispersion between the two response variables. This empirical study shows better results from the views of log-likelihood and AIC.

Using the corrected Akaike's information criterion for model selection (모형 선택에서의 수정된 AIC 사용에 대하여)

  • Song, Eunjung;Won, Sungho;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.119-133
    • /
    • 2017
  • Corrected Akaike's information criterion (AICc) is known to have better finite sample properties. However, Akaike's information criterion (AIC) is still widely used to select an optimal prediction model among several candidate models due to of a lack of research on benefits obtained using AICc. In this paper, we compare the performance of AIC and AICc through numerical simulations and confirm the advantage of using AICc. In addition, we also consider the performance of quasi Akaike's information criterion (QAIC) and the corrected quasi Akaike's information criterion (QAICc) for binomial and Poisson data under overdispersion phenomenon.

Analysis of Violent Crime Count Data Based on Bivariate Conditional Auto-Regressive Model (이변량 조건부자기회귀모형을이용한강력범죄자료분석)

  • Choi, Jung-Soon;Park, Man-Sik;Won, Yu-Bok;Kim, Hag-Yeol;Heo, Tae-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.3
    • /
    • pp.413-421
    • /
    • 2010
  • In this study, we considered bivariate conditional auto-regressive model taking into account spatial association as well as correlation between the two dependent variables, which are the counts of murder and burglary. We conducted likelihood ratio test for checking over-dispersion issues prior to applying spatial poisson models. For the real application, we used the annual counts of violent crimes at 25 districts of Seoul in 2007. The statistical results are visually illustrated by geographical information system.

Analysis of scientific military training data using zero-inflated and Hurdle regression (영과잉 및 허들 회귀모형을 이용한 과학화 전투훈련 자료 분석)

  • Kim, Jaeoh;Bang, Sungwan;Kwon, Ojeong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1511-1520
    • /
    • 2017
  • The purpose of this study is to analyze military combat training data to improve military operation and training methods and verify required military doctrine. We set the number of combat disabled enemies, which the individual combatants make using their weapons, as the response variable regarding offensive operations from scientific military training data of reinforced infantry battalion. Our response variable has more zero observations than would be allowed for by the traditional GLM such as Poisson regression. We used the zero-inflated regression and the hurdle regression for data analysis considering the over-dispersion and excessive zero observation problems. Our result can be utilized as an appropriate reference in order to verify a military doctrine for small units and analysis of various operational and tactical factors.

Derivation and verification of influence function on parameter δ proposed by Ghosh and Kim (Ghosh와 Kim 모수 δ의 영향함수 유도 및 확인)

  • Kim, Minjeong;Kim, Honggie
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.4
    • /
    • pp.529-538
    • /
    • 2017
  • The Ghosh and Kim zero-altered distribution model is used to analyze count data that have too many or too few zeros. The dispersion type parameter ${\delta}$ in the zero-altered distribution model consists of mean, variance and zero probability and has two forms depending on the relation between ${\mu}$ and ${\sigma}^2$. We derived the influence function on ${\delta}$ when ${\sigma}^2{\geq}{\mu}$. To show the validity of the influence function, we used the Census data on the number of births of married women in Korea to compare the estimated changes in ${\delta}$ using this function with those obtained using the direct deletion method. The result proved that the obtained influence function is very accurate in estimating changes in ${\delta}$ when an observation is deleted.

A Modeling of Daily Temperature in Seoul using GLM Weather Generator (GLM 날씨 발생기를 이용한 서울지역 일일 기온 모형)

  • Kim, Hyeonjeong;Do, Hae Young;Kim, Yongku
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.3
    • /
    • pp.413-420
    • /
    • 2013
  • Stochastic weather generator is a commonly used tool to simulate daily weather time series. Recently, a generalized linear model(GLM) has been proposed as a convenient approach to tting these weather generators. In the present paper, a stochastic weather generator is considered to model the time series of daily temperatures for Seoul South Korea. As a covariate, precipitation occurrence is introduced to a relate short-term predictor to short-term predictands. One of the limitations of stochastic weather generators is a marked tendency to underestimate the observed interannual variance of monthly, seasonal, or annual total precipitation. To reduce this phenomenon, we incorporate a time series of seasonal mean temperatures in the GLM weather generator as a covariate.

Fit of the number of insurance solicitor's turnovers using zero-inflated negative binomial regression (영과잉 음이항회귀 모형을 이용한 보험설계사들의 이직횟수 적합)

  • Chun, Heuiju
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1087-1097
    • /
    • 2017
  • This study aims to find the best model to fit the number of insurance solicitor's turnovers of life insurance companies using count data regression models such as poisson regression, negative binomial regression, zero-inflated poisson regression, or zero-inflated negative binomial regression. Out of the four models, zero-inflated negative binomial model has been selected based on AIC and SBC criteria, which is due to over-dispersion and high proportion of zero-counts. The significant factors to affect insurance solicitor's turnover found to be a work period in current company, a total work period as financial planner, an affiliated corporation, and channel management satisfaction. We also have found that as the job satisfaction or the channel management satisfaction gets lower as channel management satisfaction, the number of insurance solicitor's turnovers increases. In addition, the total work period as financial planner has positive relationship with the number of insurance solicitor's turnovers, but the work period in current company has negative relationship with it.