DOI QR코드

DOI QR Code

A study on Bayesian beta regressions for modelling rates and proportions

비율자료 모델링을 위한 베이지안 베타회귀모형의 비교 연구

  • Jeongin Lee (Department of Statistics and Data Science, Inha University) ;
  • Jaeoh Kim (Department of Statistics and Data Science, Inha University) ;
  • Seongil Jo (Department of Statistics and Data Science, Inha University)
  • 이정인 (인하대학교 통계.데이터사이언스학과) ;
  • 김재오 (인하대학교 통계.데이터사이언스학과) ;
  • 조성일 (인하대학교 통계.데이터사이언스학과)
  • Received : 2023.12.21
  • Accepted : 2024.01.23
  • Published : 2024.06.30

Abstract

In cases where the response variable in proportional data is confined to a limited interval, a regression model based on the assumption of normality can yield inaccurate results due to issues such as asymmetry and heteroscedasticity. In such cases, the beta regression model can be considered as an alternative. This model reparametrizes the beta distribution in terms of mean and precision parameters, assuming that the response variable follows a beta distribution. This allows for easy consideration of heteroscedasticity in the data. In this paper, we therefore aim to analyze proportional data using the beta regression model in two empirical analyses. Specifically, we investigate the relationship between smoking rates and coffee consumption using data from the 6th National Health Survey, and examine the association between regional characteristics in the U.S. and cumulative mortality rates based on COVID-19 data. In each analysis, we apply the ordinary least squares regression model, the beta regression model, and the extended beta regression model to analyze the data and interpret the results with the selected optimal model. The results demonstrate the appropriateness of applying the beta regression model and its extended version in proportional data.

비율자료와 같이 반응변수가 제한된 구간에 속하는 경우 비대칭성이나 이분산성의 문제들로 인해 정규성 가정을 기반으로 하는 회귀모형의 적용은 부정확한 결과가 도출될 수 있다. 이러한 경우 대안으로 베타회귀모형이 고려된다. 베타회귀모형은 베타분포를 평균과 정밀도 모수로 재모수화 하였을 때, 반응변수가 베타분포를 따른다는 가정하에 평균과 정밀도에 대한 하위모형을 갖는 회귀모형으로 자료의 이분산성을 쉽게 고려할 수 있다. 본 연구에서는 두 가지 실증분석에서 비율자료에 베타회귀모형을 적합하여 분석하고자 한다. 특히, 제6기 국민 건강조사자료를 통해 흡연율과 커피 섭취와의 연관성을, COVID-19 자료를 기반으로 미국의 지역 특성들과 누적 사망률의 연관성을 고찰한다. 각 분석에서는 보통최소제곱 회귀모형과 베타회귀모형 및 확장된 베타회귀모형을 적용하여 최적의 모형을 선택하고 결과를 해석한다. 분석의 결과는 비율자료에서 베타회귀모형 및 확장된 베타회귀모형 적용의 적절성을 입증한다.

Keywords

Acknowledgement

이 연구는 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구 사업임 (NRF-2022R1A5A7033499, RS-2023-00209229), 김재오의 연구는 2024년 과학기술정보통신부 및 정보통신기획평가원의 SW중심 대학사업의 연구결과로 수행되었음(2022-0-01127).

References

  1. Ahn HJ, Gwak JI, Yun SJ, Choi HJ, Nam JW, and Shin JS (2017). The influence of coffee consumption for smoking behavior, Korean Journal of Family Practice, 7, 218-222.  https://doi.org/10.21215/kjfp.2017.7.2.218
  2. Bayes CL, Bazan JL, and Garc ' 'ia CB (2012). A new robust regression model for proportions, Bayesian Analysis, 21, 841-866.  https://doi.org/10.1214/12-BA728
  3. Branscum AJ, Johnson WO, and Thurmond MC (2007). Bayesian beta regression: Applications to household expenditure data and genetic distance between foot-and-mouth disease viruses, Australian and New Zealand Journal of Statistics, 49, 287-301.  https://doi.org/10.1111/j.1467-842X.2007.00481.x
  4. Buntaine MT (2011). Does the Asian development bank respond to past environmental performance when allocating environmentally risky financing?, World Development, 39, 336-350.  https://doi.org/10.1016/j.worlddev.2010.07.001
  5. Breusch TS and Pagan AR (1979). A simple test for heteroscedasticity and random coefficient variation, Econometrica, 47, 1287-1294.  https://doi.org/10.2307/1911963
  6. Carpenter B, Gelman A, Hoffman MD et al (2017). Stan: A probabilistic programming language, Journal of Statistical Software, 76, 1-32.  https://doi.org/10.18637/jss.v076.i01
  7. Cribari-Neto F and Zeileis A (2010). Beta regression in R, Journal of Statistical Software, 34, 1-24. https://doi.org/10.18637/jss.v034.i02
  8. Ferrari SL and Cribari-Neto F (2004). Beta regression for modelling rates and proportions, Journal of Applied Statistics, 31, 799-815.  https://doi.org/10.1080/0266476042000214501
  9. Ferrari SL, Espinheira PL, and Cribari-Neto F (2011). Diagnostic tools in beta regression with varying dispersion, Statistica Neerlandica, 65, 337-351.  https://doi.org/10.1111/j.1467-9574.2011.00488.x
  10. Han B, Yun W, and Kim J (2020). Analysis of mobilization training data using beta regression, Journal of the Korean Data and Information Science Society, 31, 611-620.  https://doi.org/10.7465/jkdi.2020.31.3.611
  11. Homan MD and Gelman A (2014). The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo, Journal of Machine Learning Research, 15, 1593-1623. 
  12. Jang E, Choi S, and Kin D (2018). Robust Bayesian beta regression analysis, Journal of the Korean Data and Information Science Society, 29, 27-36.  https://doi.org/10.7465/jkdi.2018.29.1.27
  13. Jang E (2017). Analysis of health-related quality of life using beta regression, Journal of the Korean Data and Information Science Society, 28, 547-557. 
  14. Joao M and Vinicius M (2022). bayesbr: Beta regression on a Bayesian model, Retrieved Oct. 12, 2022, Available from: https://cran.r-project.org/web/packages/bayesbr/bayesbr.pdf 
  15. Kang K, Sung J, and Kim CY (2010). High risk groups in health behavior defined by clustering of smoking, alcohol, and exercise habits, National Heath and Nutrition Examination Survey, 43, 73-83.  https://doi.org/10.3961/jpmph.2010.43.1.73
  16. Kelley GO, Garabed R, Branscum A, Perez A, and Thurmond M (2007). Prediction model for sequence variation in the glycoprotein gene of infectious hematopoietic necrosis virus in California, U.S.A, Diseases of Aquatic Organisms, 78, 97-104.  https://doi.org/10.3354/dao01864
  17. Li D, Gaynor SM, Quick C, Chen JT, Stephenson BJK, Coull BA, and Lin X (2021). Identifying US County-level characteristics associated with high COVID-19 burden, BMC Public Health, 21, 1-10.  https://doi.org/10.1186/s12889-020-10013-y
  18. Liu F and Eugenio EC (2016). A review and comparison of Bayesian and likelihood-based inferences in beta regression and zero-or-one-inflated beta regression, Statistical Methods in Medical Research, 27, 1024-1044.  https://doi.org/10.1177/0962280216650699
  19. Peplonska B, Bukowska A, Sobala W et al (2012). Rotating night shift work and mammographic density, Cancer Epidemiology, Biomarkers and Prevention, 21, 1028-1037.  https://doi.org/10.1158/1055-9965.EPI-12-0005
  20. Smithson M and Verkuilen J (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables, Psychological Methods, 11, 54-71.  https://doi.org/10.1037/1082-989X.11.1.54
  21. Simas AB, Barreto-Souza W, and Rocha AV (2010). Improved estimators for a general class of beta regression models, Computational Statistics and Data Analysis, 54, 348-366.  https://doi.org/10.1016/j.csda.2009.08.017
  22. Zhou H and Huang X (2022). Bayesian beta regression for bounded responses with unknown supports, Computational Statistics and Data Anslysis, 167, 107345.