DOI QR코드

DOI QR Code

Penalized variable selection in mean-variance accelerated failure time models

평균-분산 가속화 실패시간 모형에서 벌점화 변수선택

  • Kwon, Ji Hoon (Statistics Team, APACE Inc.) ;
  • Ha, Il Do (Department of Statistics, Pukyong National University)
  • Received : 2021.04.05
  • Accepted : 2021.04.14
  • Published : 2021.06.30

Abstract

Accelerated failure time (AFT) model represents a linear relationship between the log-survival time and covariates. We are interested in the inference of covariate's effect affecting the variation of survival times in the AFT model. Thus, we need to model the variance as well as the mean of survival times. We call the resulting model mean and variance AFT (MV-AFT) model. In this paper, we propose a variable selection procedure of regression parameters of mean and variance in MV-AFT model using penalized likelihood function. For the variable selection, we study four penalty functions, i.e. least absolute shrinkage and selection operator (LASSO), adaptive lasso (ALASSO), smoothly clipped absolute deviation (SCAD) and hierarchical likelihood (HL). With this procedure we can select important covariates and estimate the regression parameters at the same time. The performance of the proposed method is evaluated using simulation studies. The proposed method is illustrated with a clinical example dataset.

가속화 실패시간모형은 로그 생존시간과 공변량간의 선형적 관계를 묘사해 준다. 가속화 실패시간모형에서 생존시간의 평균뿐만 아니라 변동성에도 영향을 미치는 공변량 효과를 추론하는 것은 흥미가 있다. 이를 위해 생존시간의 평균뿐만 아니라 분산을 모형화 하는 것이 필요하며, 이러한 모형을 평균-분산 가속화 실패시간모형이라 부른다. 본 논문에서는 벌점 가능도함수를 이용하여 평균-분산 가속화 실패시간모형에서 회귀모수에 대한 변수선택 절차를 제안한다. 여기서 벌점함수로서 LASSO, ALASSO, SCAD 그리고 HL (계층가능도)와 같은 네 가지 벌점함수를 연구한다. 제안된 변수선택 절차를 통해 중요한 공변량의 선택 뿐만 아니라 회귀모수의 추정을 동시에 제공할 수 있다. 제안된 방법의 성능은 모의실험을 통해 평가하고, 하나의 임상 예제자료를 통해 제안된 방법을 예증하고자 한다.

Keywords

Acknowledgement

이 논문은 부경대학교 자율창의학술연구비(2019년)에 의하여 연구되었음.

References

  1. Antoniadis A, Gijbels I, Lambert-Lacroix S, and Poggi J (2016). Joint estimation and variable selection for mean and dispersion in proper dispersion models, Electronic Journal of Statistics, 10, 1630-1676. https://doi.org/10.1214/16-EJS1152
  2. Charalambous C, Pan J, and Tranmer M (2015). Variable selection in joint modelling of the mean and variance for hierarchical data, Statistical Modelling, 15, 24-50. https://doi.org/10.1177/1471082X13520424
  3. Cox DR (1972). Regression models and life-tables. Journal of the Royal Statistical Society-Series B, 34, 187-220.
  4. Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273
  5. Fan J and Li R (2002). Variable selection for Cox's proportional hazards model and frailty model, The Annals of Statistics, 30, 74-99. https://doi.org/10.1214/aos/1015362185
  6. Ha ID, Lee Y, and Song JK (2002). Hierarchical likelihood approach for mixed linear models with censored data, Lifetime Data Analysis, 8, 163-176. https://doi.org/10.1023/A:1014839723865
  7. Ha ID, Pan J, Oh S, and Lee Y (2014). Variable selection in general frailty models using penalized h-likelihood, Journal of Computational and Graphical Statistics, 23, 1044--1060 https://doi.org/10.1080/10618600.2013.842489
  8. Ha ID, Jeong JH, and Lee Y (2017). Statistical Modelling of Survival Data with Random Effects: H-Likelihood Approach, Springer, Singapore.
  9. Hutton JL and Monaghan PF (2002). Choice of parametric accelerated life and proportional hazard models for survival data: asymptotic results, Lifetime Data Analysis, 8, 375-393. https://doi.org/10.1023/A:1020570922072
  10. Klein JP and Moeschberger ML (2003). Survival Analysis : Techniques for Censored and Truncated Data(2nd ed), Springer, New York.
  11. Lawless JF (1982). Statistical Models and Methods for Lifetime Data, Wiley, New York.
  12. Lee Y and Oh H (2014). A new sparse variable selection via random-effect model, Journal of Multivariate Analysis, 125, 89-99. https://doi.org/10.1016/j.jmva.2013.11.016
  13. MacKenzie G (1996). Regression models for survival data: the generalized time-dependent logistic family, The Statistician, 45, 21-34. https://doi.org/10.2307/2348408
  14. Nelder JA and Lee Y (1998). Joint modeling of mean and dispersion, Technometrics, 40, 168-171. https://doi.org/10.1080/00401706.1998.10485225
  15. Nedler JA and Wedderburn RWM (1972). Generalized linear models, Journal of the Royal Statistical Society A, 135, 370-384. https://doi.org/10.2307/2344614
  16. Park E and Ha ID (2018). Penalized variable selection for accelerated failure time models, Communications for Statistical Applications and Methods, 25, 591-604. https://doi.org/10.29220/CSAM.2018.25.6.591
  17. Tibshirani R (1996). Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society B, 58, 267-288.
  18. Wang H, Li R, and Tsai CL (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method, Biometrika, 94, 553--568. https://doi.org/10.1093/biomet/asm053
  19. Wang X and Song L (2011). Adaptive lasso variable selection for the accelerated failure models, Communications in Statistics - Theory and Methods, 40, 4372-4386. https://doi.org/10.1080/03610926.2010.513785
  20. Wu L and Li H (2012). Variable selection for joint mean and dispersion models of the inverse Gaussian distribution, Metrika, 75, 795-808. https://doi.org/10.1007/s00184-011-0352-x
  21. Zhou M (2005). Empirical likelihood analysis of the rank estimator for the censored accelerated failure time model, Biometrika, 92, 492-498. https://doi.org/10.1093/biomet/92.2.492
  22. Zou H (2006). The adaptive Lasso and its oracle properties. Journal of American Statistical Association, 101, 1418-1429. https://doi.org/10.1198/016214506000000735