DOI QR코드

DOI QR Code

Workplace panel survey data analysis using Bayesian cumulative probit linear mixed model

베이지안 누적 프로빗 선형 혼합모형을 이용한 사업체 패널조사데이터 분석

  • Minji Kwon (Department of Statistics, Sungkyunkwan University) ;
  • Keunbaik Lee (Department of Statistics, Sungkyunkwan University)
  • 권민지 (성균관대학교 통계학과) ;
  • 이근백 (성균관대학교 통계학과)
  • Received : 2024.03.16
  • Accepted : 2024.06.11
  • Published : 2024.12.31

Abstract

Longitudinal data are measured repeatedly over time from the same subject. Therefore, the repeated outcomes have correlations, and it is necessary to estimate the covariate effect on the response variable while explaining the correlations. In longitudinal ordinal data analysis, the covariate effect is estimated using generalized linear mixed models using a logit link function or a probit link function. In this paper, we review the generalized linear mixed models and marginalized models with the two types of link functions for longitudinal ordinal data analysis. Specifically, a Bayesian cumulative probit linear mixed model with the probit link function is used to analyze Korean workplace panel survey (WPS) data, which is longitudinal ordinal data. In the model, the correlation matrix is high-dimensional and positive definite, and it is estimated using the hypersphere decomposition. In the WPS data, corporate training participation rate is considered as a response variable. Assuming different correlation structures, several models are compared. For the most suitable model, some explanatory variables, the annual effect, profit sharing schemes status, average annual training hours per person, and labor union status, have effects on corporate training participation rate.

경시적 자료는 같은 개체에서 시간에 따라 반복 측정된 자료이다. 따라서 반복 측정된 자료는 상관관계가 존재하며 이것을 설명하면서 공변량의 반응변수의 효과를 추정해야 한다. 경시적 순서형 자료분석에서는 잠재변수의 조건부 누적확률을 로짓 연결함수 또는 프로빗 연결함수를 이용한 선형혼합 모형을 이용하여 공변량의 효과를 추정한다. 본 논문에서는 경시적 순서형 자료분석을 위한 두 가지 형태의 연결함수를 가지는 일반화선형혼합모형 및 주변화모형을 고찰한다. 그리고 최근에 제안된 프로빗 연결함수를 가지는 베이지안 누적 프로빗 선형혼합모형을 이용하여 경시적 순서형자료인 사업체 패널조사자료를 분석한다. 이 모형은 잠재변수의 조건부 상관계수 행렬의 모형화에 초구분해를 고려하여 고차원이며 양정치성을 만족하는 상관계수를 추정하는 방법이다. 사업체 패널 조사자료는 반응변수로 순서형 자료인 사업체의 교육훈련참여율을 고려하였고, 상관계수 행렬은 자기상관구조를 가정한 여러 모형을 비교하고 가장 적합한 모형을 제시한다. 그리고 그 모형을 이용하여 연도별 효과와 성과배분제도 실시여부, 1인당 연평균 교육시간, 노동조합여부가 유의미한 것을 찾았다.

Keywords

Acknowledgement

이 논문은 정부의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임(NRF-2022R1A2C1002752 RS-2024-00416117).

References

  1. Agresti A (2013). Categorical Data Analysis (3rd ed), Wiley, New York. 
  2. Anderson JA and Pemberton J (1985). The grouped continuous model for multivariate ordered categorical variables and covariate adjustment, Biometrics, 41, 875-885. 
  3. Cowles M (1996). Accelerating monte carlo markov chain convergence for cumulative link generalized linear models, Statistics and Computing, 6, 101-111. 
  4. Green CP and Heywood JS (2011). Profit sharing, separation and training, British Journal of Industrial Relations, 49, 623-642. 
  5. Heagerty PJ and Kurland BF (2001). Misspecified maximum likelihood estimates and generalised linear mixed models, Biometrika, 88, 973-985. 
  6. Hedeker D and Mermelstein RJ (1998). A multilevel thresholds of change model for analysis of stages of change data, Multivariate Behavioral Research, 33, 427-455. 
  7. Kim D-B and Lee I (2018). Profit sharing and firm training, Korean Journal of Human Resource Development, 21, 119-141. 
  8. Kim J, Sohn I, and Lee K (2017). Bayesian modeling of random effects precision/- covariance matrix in cumulative logit random effects models, Communications for Statistical Applications and Methods, 24, 81-96. 
  9. Kwon M (2023). WPS data analysis using Bayesian cumulative probit linear mixed model (Sunkyunkwan University MS thesis), Sunkyunkwan University, Seoul. 
  10. KimMand Noh Y (2013). Effect of introduction of 40-hour workweek on the training participation rate of incumbent workers, In Proceeding of the 7thWorkplace Panel Survey Conference, Available from: https://www.kli.re.kr/wps/cnfrncView.es?&mid=a60401000000&keyYear=&keyField=&keyWord=¤tPage=1&cnfrnc_no=18¤tPage=1 
  11. Lee K, Cho H, Kwak M-S, and Jang EJ (2020). Estimation of covariance matrix of multivariate longitudinal data using modified choleksky and hypersphere decompositions, Biometrics , 76, 75-86. 
  12. Lee K and Daniels MJ (2008). Marginalized models for longitudinal ordinal data with application to quality of life studies, Statistics in Medicine, 27, 4359-4380. 
  13. Lee K-J, Chen R-B, and Lee K (2024). Robust bayesian cumulative probit linear mixed models for Longitudinal ordinal data, Computational Statistics, Resubmitted. 
  14. Pinheiro JC and Bates DM (1996). Unconstrained parametrizations for variancecovariance matrices, Statistics and Computing, 6, 289-296. 
  15. Pourahmadi M (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation, Biometrika , 86, 677-690. 
  16. Varin C and Czado C (2010). A mixed autoregressive probit model for ordinal longitudinal data, Biostatistics, 11, 127-138. 
  17. Zhang W, Leng C, and Tang CY (2015). A joint modelling approach for longitudinal studies, Journal of Royal Statistical Society, Series B, 77, 219-238. 
  18. Yun D and Lee K (2020). Comparison between AR and ARMA covariance matrices for multivariate longitudinal data, Journal of the Korean Data & Information Science Society, 31, 721-740.