Browse > Article
http://dx.doi.org/10.5351/KJAS.2018.31.6.733

A new sample selection model for overdispersed count data  

Jo, Sung Eun (Department of Applied Statistics, Konkuk University)
Zhao, Jun (Department of Applied Statistics, Konkuk University)
Kim, Hyoung-Moon (Department of Applied Statistics, Konkuk University)
Publication Information
The Korean Journal of Applied Statistics / v.31, no.6, 2018 , pp. 733-749 More about this Journal
Abstract
Sample selection arises as a result of the partial observability of the outcome of interest in a study. Heckman introduced a sample selection model to analyze such data and proposed a full maximum likelihood estimation method under the assumption of normality. Recently sample selection models for binomial and Poisson response variables have been proposed. Based on the theory of symmetry-modulated distribution, we extend these to a model for overdispersed count data. This type of data with no sample selection is often modeled using negative binomial distribution. Hence we propose a sample selection model for overdispersed count data using the negative binomial distribution. A real data application is employed. Simulation studies reveal that our estimation method based on profile log-likelihood is stable.
Keywords
sample selection bias; Heckman's sample selection model; overdispersed data; negative binomial regression; Poisson regression;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Agresti, A. (2013). Categorical Data Analysis (3rd ed), Wiley.
2 Azzalini, A. and Capitanio, A. (2014). The Skew-Normal and Related Families, IMS Monographs series.
3 Azzalini, A., Kim, H. M., and Kim, H. J. (2018). Sample selection models for discrete and other non-Gaussian response variables, Statistical Methods & Applications, accepted
4 Boyes, W., Hoffman, D., and Low, S. (1989). An econometric analysis of the bank credit scoring problem, Journal of Econometrics, 40, 3-14.   DOI
5 Greene, W. H. (1992). A Statistical Model for Credit Scoring, NYU Working Paper, EC-92-29, Available at SSRN: https://ssrn.com/abstract=1867088.
6 Greene, W. H. (2012). Econometric Analysis (7th ed), Pearson Education Ltd.
7 Heckman, J. J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables, and a simple estimator for such models, Annals of Economic and Social Measurement, 5, 475-492.
8 Heckman, J. J. (1979). Sample selection bias as a specification error, Econometrica, 47, 153-161.   DOI
9 Riphahn, R. T., Wambach, A., and Million, A. (2003). Incentive effects in the demand for health care: a bivariate panel count data estimation, Journal of Applied Econometrics, 18, 387-405.   DOI
10 Rubin, D. B. (1976). Inference and missing data, Biometrika, 63, 581-592.   DOI
11 Terza, J. (1998). Estimating count data models with endogenous switching: sample selection and endogenous treatment effects, Journal of Econometrics, 84, 129-154.   DOI
12 Vella, F. (1998). Estimating models with sample selection bias: a survey, The Journal of Human Resources, 33, 127-169.   DOI
13 Wooldridge, J. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed), MIT Press, Cambridge.