영과잉 토빗모형을 이용한 한국 소득분포 자료의 베이지안 분석

Bayesian analysis of Korean income data using zero-inflated Tobit model

  • 황지수 (이화여자대학교 경제학과) ;
  • 김세완 (이화여자대학교 경제학과) ;
  • 오만숙 (이화여자대학교 통계학과)
  • Hwang, Jisu (Department of Economics, Ewha Womans University) ;
  • Kim, Sei-Wan (Department of Economics, Ewha Womans University) ;
  • Oh, Man-Suk (Department of Statistics, Ewha Womans University)
  • 투고 : 2017.09.05
  • 심사 : 2017.10.26
  • 발행 : 2017.12.31


한국노동패널조사에서 제공하는 2015년 한국 생산가능인구의 월평균 소득분포를 보면 0 관측치의 비율이 과도하게 높은 형태를 보여 기존의 소득분포에 주로 사용되는 토빗모형으로는 설명에 한계가 있다. 본 연구에서는 영과잉 특성을 반영하여 영과잉 토빗모형을 사용하여 한국인의 소득 자료를 분석한다. 영과잉 토빗모형은 2단계 모형으로 1단계에서는 소득이 0인 그룹을 두 그룹으로 나누는데, 첫 번째 그룹은 노동시장 참여의지가 없어 시장에 참여하지 않으므로 0이 관측되는 그룹(genuine zero)이고 두 번째 그룹은 노동시장 참여의지는 있으나 낮은 임금으로 인하여 절단되어 0이 관측되는 그룹(random zero)으로 가정하였다. 두 번째 random zero 그룹은 0 이상의 연속 자료와 결합하여 토빗모형을 적용한다. 1단계와 2단계 모형에 관심 있는 설명변수를 가진 회귀모형을 적용하여 노동시장 참여여부와 임금 수준에 영향을 미치는 요인을 알아본다. 마코브 체인 몬테칼로 기법을 사용하여 모수를 추정하고 기존의 토빗모형과 비교한 결과 영과잉 토빗모형이 0의 빈도추정과 모형 적합도 면에서 우수한 결과를 보였다. 분석결과 나이가 많을수록, 남자가 여자보다, 학력이 낮을수록, 노동시장에 참여할 가능성이 매우 유의하게 높으며, 사회경제적 지위가 높을수록 그리고 유보임금이 낮을수록 노동시장에 참여하지 않을 확률이 높은 것으로 나타났다. 임금수준을 보면, 남자가 여자보다, 학력이 높을수록, 기혼이 미혼 보다 매우 유의하게 더 높은 임금을 받는 것으로 나타났다.

Korean income data obtained from Korea Labor Panel Survey shows excessive zeros, which may not be properly explained by the Tobit model. In this paper, we analyze the data using a zero-inflated Tobit model to incorporate excessive zeros. A zero-inflated Tobit model consists of two stages. In the first stage, individuals with 0 income are divided into two groups: genuine zero group and random zero group. Individuals in the genuine zero group did not participate labor market since they have no intention to do so. Individuals in the random zero group participated labor market but their incomes are very low and truncated at 0. In the second stage, the Tobit model is assumed to a subset of data combining random zeros and positive observations. Regression models are employed in both stages to obtain the effect of explanatory variables on the participation of labor market and the income amount. Markov chain Monte Carlo methods are applied for the Bayesian analysis of the data. The proposed zero-inflated Tobit model outperforms the Tobit model in model fit and prediction of zero frequency. The analysis results show strong evidence that the probability of participating in the labor market increases with age, decreases with education, and women tend to have stronger intentions on participating in the labor market than men. There also exists moderate evidence that the probability of participating in the labor market decreases with socio-economic status and reserved wage. However, the amount of monthly wage increases with age and education, and it is larger for married than unmarried and for men than women.



  1. Carlin, B. P. and Louis, T. A. (2009). Bayesian Methods for Data Analysis (3rd ed), Chapman & Hall/CRC, Boca Raton.
  2. Chib, S. (1992). Bayes inference in the Tobit censored regression model, Journal of Econometrics, 51, 79-99.
  3. Cragg, J. (1971). Some statistical models for limited dependent variables with application to the demand for durable goods, Econometrica, 39, 829-844.
  4. Gelfand, A. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities, Journal of the American Statistical Association, 85, 398-409.
  5. Ghosh, S. K., Mukhopadhyay, P., and Lu, J. C. (2006). Bayesian analysis of zero-inflated regression models, Journal of Statistical Planning and Inference, 136, 1360-1375.
  6. Heckman, J. (1979). Sample selection bias as a specification error, Econometrica, 47, 153-161.
  7. Jones, S. (1988). The relationship between unemployment spells and reservation wages as a test of search theory, Quarterly Journal of Economics, 103, 741-765.
  8. Keum, J. H. (2011). A study on the stagnation of the gender wage differences in Korea, Kukje Kyungje Yongu, 17, 161-184.
  9. Korenman, S. and Neumark, D. (1991). Does marriage really make men more productive?, The Journal of Human Resources, 26, 282-307.
  10. Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing, Technometrics, 34, 1-14.
  11. Lin, N., Vaughn, J., and Ensel, W. (1981). Social resources and occupational status attainment, Social Forces, 59, 1163-1181.
  12. Meng, X. L. (1994). Posterior predictive p-values, Annals of Statistics, 22, 1142-1160.
  13. Mullahy, J. (1986). Specification and testing of some modified count data models, Journal of Econometrics, 33, 341-365.
  14. Saint-Pierre, Y. (1996). Do earnings rise until retirement?, Perspectives on Labour and Income, 8, 32-36.
  15. Schuring, M., Robroek S. J., Otten F. W., Arts C. H., and Burdorf A. (2013). The effect of ill health and Socio economic status on labor force exit and re-employment: a prospective study with ten years follow-up in the Netherlands, Scandinavian Journal of Work, Environment and Health, 39, 134-143.
  16. Tanner, M. and Wong, W. (1987). The calculation of posterior distributions by data augmentation, Journal of the American Statistical Association, 82, 528-540.
  17. Tobin, J. (1958). Estimation of relationships for limited dependent variables, Econometrica, 26, 24-36.
  18. Yang, Y. and Simpson, D. G. (2010). Conditional decomposition diagnostics for regression analysis of zeroinflated and left-censored data, Statistical Methods in Medical Research, 21, 393-408.