DOI QR코드

DOI QR Code

A new sample selection model for overdispersed count data

과대산포 가산자료의 새로운 표본선택모형

  • Jo, Sung Eun (Department of Applied Statistics, Konkuk University) ;
  • Zhao, Jun (Department of Applied Statistics, Konkuk University) ;
  • Kim, Hyoung-Moon (Department of Applied Statistics, Konkuk University)
  • 조성은 (건국대학교 응용통계학과) ;
  • 조준 (건국대학교 응용통계학과) ;
  • 김형문 (건국대학교 응용통계학과)
  • Received : 2018.08.31
  • Accepted : 2018.10.31
  • Published : 2018.12.31

Abstract

Sample selection arises as a result of the partial observability of the outcome of interest in a study. Heckman introduced a sample selection model to analyze such data and proposed a full maximum likelihood estimation method under the assumption of normality. Recently sample selection models for binomial and Poisson response variables have been proposed. Based on the theory of symmetry-modulated distribution, we extend these to a model for overdispersed count data. This type of data with no sample selection is often modeled using negative binomial distribution. Hence we propose a sample selection model for overdispersed count data using the negative binomial distribution. A real data application is employed. Simulation studies reveal that our estimation method based on profile log-likelihood is stable.

어떠한 연구에서 관심의 대상이 되는 관찰치가 부분적으로 관측 가능할 때 표본선택의 문제가 일어난다. 이러한 자료를 분석하기 위해 헤크만은 표본선택 모형을 개발하였고 이변량 정규분표의 가정 하에 최대우도방법을 사용하여 모수를 추정하였다. 최근 이항자료와 포아송 자료에 대한 표본선택모형이 제안되었다. 이를 분포조정에 기초하여 과대산포 자료에 대한 모형으로 확장하고자 한다. 표본선택이 없는 과대산포 자료는 흔히 음이항 분포로 분석되어진다. 따라서 음이항 분포를 이용하고 분포조정을 도입한 과대산포 자료에 대한 새로운 모형을 제시하고자 한다. 실제 자료를 이용하여 분석을 하였다. 모의실험 결과 프로파일 우도함수를 이용하여 모수에 대해 추정한 결과는 안정적이다.

Keywords

GCGHDE_2018_v31n6_733_f0001.png 이미지

Figure 5.1. Profile likelihoods for Poisson models.

GCGHDE_2018_v31n6_733_f0002.png 이미지

Figure 5.2. Profile likelihoods for negative binomial models.

Table 5.1. Poisson model (A), T ~ N(0, 1), h(y) = τ + ηy

GCGHDE_2018_v31n6_733_t0001.png 이미지

Table 5.2. Poisson model (B), T ~ Exp(1), h(y) = exp(τ + ηy)

GCGHDE_2018_v31n6_733_t0002.png 이미지

Table 5.3. Negative binomial model (A), T ~ N(0, 1), h(y) = τ + ηy

GCGHDE_2018_v31n6_733_t0003.png 이미지

Table 5.4. Negative binomial model (B), T ~ Exp(1), h(y) = exp(τ + ηy)

GCGHDE_2018_v31n6_733_t0004.png 이미지

Table 6.1. Simulation study: negative binomial response with exclusion restriction (α = -0.5)

GCGHDE_2018_v31n6_733_t0005.png 이미지

Table 6.2. Simulation study: negative binomial response without exclusion restriction (α = -0.5)

GCGHDE_2018_v31n6_733_t0006.png 이미지

Table 6.3. Simulation study: negative binomial response with exclusion restriction (α = -0.1)

GCGHDE_2018_v31n6_733_t0007.png 이미지

Table 6.4. Simulation study: negative binomial response without exclusion restriction (α = -0.1)

GCGHDE_2018_v31n6_733_t0008.png 이미지

References

  1. Agresti, A. (2013). Categorical Data Analysis (3rd ed), Wiley.
  2. Azzalini, A. and Capitanio, A. (2014). The Skew-Normal and Related Families, IMS Monographs series.
  3. Azzalini, A., Kim, H. M., and Kim, H. J. (2018). Sample selection models for discrete and other non-Gaussian response variables, Statistical Methods & Applications, accepted
  4. Boyes, W., Hoffman, D., and Low, S. (1989). An econometric analysis of the bank credit scoring problem, Journal of Econometrics, 40, 3-14. https://doi.org/10.1016/0304-4076(89)90026-2
  5. Greene, W. H. (1992). A Statistical Model for Credit Scoring, NYU Working Paper, EC-92-29, Available at SSRN: https://ssrn.com/abstract=1867088.
  6. Greene, W. H. (2012). Econometric Analysis (7th ed), Pearson Education Ltd.
  7. Heckman, J. J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables, and a simple estimator for such models, Annals of Economic and Social Measurement, 5, 475-492.
  8. Heckman, J. J. (1979). Sample selection bias as a specification error, Econometrica, 47, 153-161. https://doi.org/10.2307/1912352
  9. Riphahn, R. T., Wambach, A., and Million, A. (2003). Incentive effects in the demand for health care: a bivariate panel count data estimation, Journal of Applied Econometrics, 18, 387-405. https://doi.org/10.1002/jae.680
  10. Rubin, D. B. (1976). Inference and missing data, Biometrika, 63, 581-592. https://doi.org/10.1093/biomet/63.3.581
  11. Terza, J. (1998). Estimating count data models with endogenous switching: sample selection and endogenous treatment effects, Journal of Econometrics, 84, 129-154. https://doi.org/10.1016/S0304-4076(97)00082-1
  12. Vella, F. (1998). Estimating models with sample selection bias: a survey, The Journal of Human Resources, 33, 127-169. https://doi.org/10.2307/146317
  13. Wooldridge, J. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed), MIT Press, Cambridge.