DOI QR코드

DOI QR Code

Bayesian ordinal probit semiparametric regression models: KNHANES 2016 data analysis of the relationship between smoking behavior and coffee intake

베이지안 순서형 프로빗 준모수 회귀 모형 : 국민건강영양조사 2016 자료를 통한 흡연양태와 커피섭취 간의 관계 분석

  • Lee, Dasom (Department of Statistics, North Carolina State University) ;
  • Lee, Eunji (Department of Statistics, Korea University) ;
  • Jo, Seogil (Department of Statistics (Institute of Applied Statistics), Jeonbuk National University) ;
  • Choi, Taeryeon (Department of Statistics, Korea University)
  • Received : 2019.10.22
  • Accepted : 2019.12.10
  • Published : 2020.02.29

Abstract

This paper presents ordinal probit semiparametric regression models using Bayesian Spectral Analysis Regression (BSAR) method. Ordinal probit regression is a way of modeling ordinal responses - usually more than two categories - by connecting the probability of falling into each category explained by a combination of available covariates using a probit (an inverse function of normal cumulative distribution function) link. The Bayesian probit model facilitates posterior sampling by bringing a latent variable following normal distribution, therefore, the responses are categorized by the cut-off points according to values of latent variables. In this paper, we extend the latent variable approach to a semiparametric model for the Bayesian ordinal probit regression with nonparametric functions using a spectral representation of Gaussian processes based BSAR method. The latent variable is decomposed into a parametric component and a nonparametric component with or without a shape constraint for modeling ordinal responses and predicting outcomes more flexibly. We illustrate the proposed methods with simulation studies in comparison with existing methods and real data analysis applied to a Korean National Health and Nutrition Examination Survey (KNHANES) 2016 for investigating nonparametric relationship between smoking behavior and coffee intake.

본 논문에서는 Bayesian spectral analysis regression (BSAR) 방법론을 이용한 베이지안 순서형 프로빗 준모수 회귀모형에 대해서 고찰한다. 순서형 프로빗 회귀모형은 순서가 있는 범주형 자료를 모형화하는 방법으로, 정규 분포의 분포함수의 역함수인 프로빗 연결함수를 이용해 각 범주의 확률과 설명변수을 연결함으로써 반응변수의 확률을 모형화한다. 베이지안 프로빗 회귀 모형은 정규 분포를 따르는 잠재변수를 도입함으로써 사후 분포 도출을 용이하게 하고, 절단점에 따라 나뉘어지는 잠재변수들의 값에 따라서 반응 변수들이 범주화된다. 본 논문에서는 이러한 잠재 변수 방법을 확장해 BSAR 방법론에 기반하여 단조증가/감소와 같은 형태제약을 반영할 수 있는 베이지안 이항형 및 순서형 프로빗 준모수 회귀모형에 대해 연구한다. 모의실험을 통하여 이항형 프로빗 준모수 회귀모형과 기존의 다른 모형들 간의 적합결과를 비교하고, 형태 제약에 따른 순서형 프로빗 준모수 회귀모형의 적합결과를 비교 분석하도록 한다. 아울러, 국민건강영양조사 제 7기 1차년도 (2016) 자료(Korean National Health and Nutrition Examination Survey (KNHANES), 2016)를 바탕으로, 본 논문에서 고찰한 이항형 및 순서형 프로빗 준모수 회귀모형을 적용하여, 흡연양태와 커피섭취 간의 관계에 대한 실증적 분석을 수행한다.

Keywords

References

  1. Agresti, A. (2013). Categorical Data Analysis (3rd ed), John Wiley & Sons, NJ.
  2. Ahn, H. J., Gwak, J. I., Yun, S. J., Choi, H. J., Nam, J. W., and Shin, J. S. (2017). The influence of coffee consumption for smoking behavior, Korean Journal of Family Practice, 7, 218-222. https://doi.org/10.21215/kjfp.2017.7.2.218
  3. Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data, Journal of the American Statistical Association, 88, 669-679. https://doi.org/10.1080/01621459.1993.10476321
  4. Carmody, T. P., Brischetto, C. S., Matarazzo, J. D., O’Donnell, R. P., and Connor, W. E. (1985). Cooccurrent use of cigarettes, alcohol, and coffee in healthy, community-living men and women. Health Psychology, 4, 323. https://doi.org/10.1037/0278-6133.4.4.323
  5. Chen, M. H. and Dey, D. K. (2000). Bayesian analysis for correlated ordinal data models. In Generalized Linear Models: A Bayesian Perspective (volume 5, pages 133-157), Dekker, New York.
  6. Chipman, H. A., George, E. I., and McCulloch, R. E. (2010). BART: Bayesian additive regression trees, The Annals of Applied Statistics, 4, 266-298. https://doi.org/10.1214/09-AOAS285
  7. Cho, K. S. (2013). Prevalence of hardcore smoking and its associated factors in Korea, Health and Social Welfare Review, 33, 603-628. https://doi.org/10.15709/hswr.2013.33.1.603
  8. Clark, A., Georgellis, Y., and Sanfey, P. (2001). Scarring: The psychological impact of past unemployment, Economica, 68, 221-241. https://doi.org/10.1111/1468-0335.00243
  9. Cowles, M. K., Carlin, B. P., and Connett, J. E. (1996). Bayesian tobit modeling of longitudinal ordinal clinical trial compliance data with nonignorable missingness, Journal of the American Statistical Association, 91, 86-98. https://doi.org/10.1080/01621459.1996.10476666
  10. Geisser, S. and Eddy, W. F. (1979). A predictive approach to model selection, Journal of the American Statistical Association, 74, 153-160. https://doi.org/10.1080/01621459.1979.10481632
  11. Harris, M. N. and Zhao, X. (2007). A zero-inflated ordered probit model, with an application to modelling tobacco consumption, Journal of Econometrics, 141, 1073-1099. https://doi.org/10.1016/j.jeconom.2007.01.002
  12. Hasegawa, H. (2010). Analyzing tourists' satisfaction: a multivariate ordered probit approach, Tourism Management, 31, 86-97. https://doi.org/10.1016/j.tourman.2009.01.008
  13. Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models, Monographs on Statistics and Applied Probability (Vol 43), Chapman and Hall, London.
  14. Jara, A., Hanson, T. E., and Lesaffre, E. (2009). Robustifying generalized linear mixed models using a new class of mixtures of multivariate Polya trees, Journal of Computational and Graphical Statistics, 18, 838-860. https://doi.org/10.1198/jcgs.2009.07062
  15. Jo, S., Choi, T., Park, B., and Lenk, P. (2019). bsamGP: An R package for Bayesian spectral analysis models using Gaussian process priors, Journal of Statistical Software, 90, 1-41.
  16. Jung, K. W., Won, Y. J., Kong, H. J., Lee, E. S., and Community of Population-Based Regional Cancer Registries (2018). Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2015, Cancer Research and Treatment: Official Journal of Korean Cancer Association, 50, 303-316. https://doi.org/10.4143/crt.2018.143
  17. Kang, E., Lee, J. A., and Cho, H. J. (2017). Characteristics of hardcore smokers in South Korea from 2007 to 2013, BMC Public Health, 17, 521. https://doi.org/10.1186/s12889-017-4452-z
  18. Kim, M. (2015). Semiparametric approach to logistic model with random intercept, Korean Journal of Applied Statistics, 28, 1121-1131. https://doi.org/10.5351/KJAS.2015.28.6.1121
  19. Kockelman, K. M. and Kweon, Y. J. (2002). Driver injury severity: an application of ordered probit models, Accident Analysis & Prevention, 34, 313-321. https://doi.org/10.1016/S0001-4575(01)00028-8
  20. Koop, G., Poirier, D. J., and Tobias, J. L. (2007). Bayesian Econometric Methods (Econometric Exercises), Cambridge University Press, Cambridge.
  21. Korean Centers for Disease Control and Prevention (2016). The Seventh Korea National Health and Nutrition Examination Survey (KNHANES VII-1).
  22. Lee, J. H. and Heo, T. Y. (2014). A study of effect on the smoking status using multilevel logistic model, Korean Journal of Applied Statistics, 27, 89-102. https://doi.org/10.5351/KJAS.2014.27.1.089
  23. Lenk, P. J. and Choi, T. (2017). Bayesian analysis of shape-restricted functions using Gaussian process priors, Statistica Sinica, 27, 43-69.
  24. Moon, S. (2016). Types of smoking statuses and associated factors among Korean wageworkers, Journal of Korean Public Health Nursing, 30, 495-511. https://doi.org/10.5932/JKPHN.2016.30.3.495
  25. Nelder, J. A. and Wedderburn, R. W. (1972). Generalized linear models, Journal of the Royal Statistical Society. Series A (General), 135, 370-384. https://doi.org/10.2307/2344614
  26. Park, J. C., Kim, M. H., and Lee, J. Y. (2018). Nomogram comparison conducted by logistic regression and naive Bayesian classifier using type 2 diabetes mellitus (T2D), Korean Journal of Applied Statistics, 31, 573-585. https://doi.org/10.5351/KJAS.2018.31.5.573
  27. Seok, H. E., Bang, H. J., and Kim, S. Y. (2017). Bayesian analysis of KBSID-III adaptive behavior data using a zero-inflated ordered probit model, Korean Journal of Psychology: General, 36, 215-239. https://doi.org/10.22257/kjp.2017.06.36.2.215
  28. Sha, N. and Dechi, B. O. (2019). A Bayes inference for ordinal response with latent variable approach, Stats, 2, 321-331. https://doi.org/10.3390/stats2020023
  29. Tan, Y. V. and Roy, J. (2019). Bayesian additive regression trees and the general BART model, Statistics in Medicine, 38, 5048-5069. https://doi.org/10.1002/sim.8347
  30. Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory, Journal of Machine Learning Research, 11, 3571-3594.
  31. Wood, S. N. (2017). Generalized Additive Models: An Introduction with R (2nd ed), CRC Press, Florida.
  32. Xie, Y., Zhang, Y., and Liang, F. (2009). Crash injury severity analysis using Bayesian ordered probit models, Journal of Transportation Engineering, 135, 18-25. https://doi.org/10.1061/(ASCE)0733-947X(2009)135:1(18)