DOI QR코드

DOI QR Code

Penalized least distance estimator in the multivariate regression model

다변량 선형회귀모형의 벌점화 최소거리추정에 관한 연구

  • Jungmin Shin (Department of Mathematics, Korea Military Academy) ;
  • Jongkyeong Kang (Department of Information Statistics, Kangwon University) ;
  • Sungwan Bang (Department of Mathematics, Korea Military Academy)
  • Received : 2023.03.02
  • Accepted : 2023.07.04
  • Published : 2024.02.29

Abstract

In many real-world data, multiple response variables are often dependent on the same set of explanatory variables. In particular, if several response variables are correlated with each other, simultaneous estimation considering the correlation between response variables might be more effective way than individual analysis by each response variable. In this multivariate regression analysis, least distance estimator (LDE) can estimate the regression coefficients simultaneously to minimize the distance between each training data and the estimates in a multidimensional Euclidean space. It provides a robustness for the outliers as well. In this paper, we examine the least distance estimation method in multivariate linear regression analysis, and furthermore, we present the penalized least distance estimator (PLDE) for efficient variable selection. The LDE technique applied with the adaptive group LASSO penalty term (AGLDE) is proposed in this study which can reflect the correlation between response variables in the model and can efficiently select variables according to the importance of explanatory variables. The validity of the proposed method was confirmed through simulations and real data analysis.

동일한 설명변수 집합에 여러 개의 반응 변수들이 종속되어 있는 경우를 많은 실제 자료에서 볼 수 있다. 특히, 여러 개의 반응변수가 서로 상관관계를 가지고 있으면 각각의 반응변수에 대한 개별적인 분석보다는 반응변수들 사이의 상관관계를 고려한 동시 추정(simultaneous estimation)이 매우 효과적이다. 이러한 다변량 회귀분석에서 최소거리추정량(least distance estimator; LDE)은 반응변수들간의 상관관계를 모형 적합 과정에 반영하여 다차원 유클리드 공간에서 각 훈련 개체와 추정값 사이의 거리를 최소화하도록 회귀계수들을 동시에 추정한다. 뿐만 아니라 최소거리추정량은 이상치에 대한 강건성을 제공한다. 본 논문에서는 다변량 선형 회귀분석에서의 최소거리추정법에 대해 살펴보고, 나아가 효율적인 변수선택을 위한 벌점화 최소거리추정량을 제시하였다. 본 연구에서 제안하는 adaptive group LASSO 벌점항을 적용한 AGLDE 기법은 반응변수들간의 상관관계를 모형 적합에 반영함과 동시에 설명변수의 중요도에 따라 효율적으로 변수선택을 수행할 수 있다. 제안 방법의 유용성은 모의실험과 실제 자료 분석을 통해 확인하였다.

Keywords

Acknowledgement

This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (Grant No. NRF-2022R1F1A1061622).

References

  1. Bai ZD, Chen XR, Miao BQ, and Rao CR (1990). Asymptotic theory of least distance estimate in multivariate linear model, Statistics, 21, 503-519. https://doi.org/10.1080/02331889008802260
  2. Bang S and Jhun M (2013). Two-stage penalized composite quantile regression with grouped variables, Communications for Statistical Applications and Methods, 20, 259-270. https://doi.org/10.5351/CSAM.2013.20.4.259
  3. Bang S and Jhun M (2014). Adaptive sup-norm regularized simultaneous multiple quantiles regression, Statistics, 48, 17-33. https://doi.org/10.1080/02331888.2012.719512
  4. Breiman L and Friedman JH (1997). Predicting multivariate responses in multiple linear regression, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59, 3-54. https://doi.org/10.1111/1467-9868.00054
  5. Brown PJ and Zidek JV (1980). Adaptive multivariate ridge regression, The Annals of Statistics, 8, 64-74. https://doi.org/10.1214/aos/1176344891
  6. Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273
  7. Haitovsky Y (1987). On multivariate ridge regression, Biometrika, 74, 563-570. https://doi.org/10.1093/biomet/74.3.563
  8. Hoerl A and Kennard R (1970). Ridge regression: Biased estimation for nonorthogonal problems, Technometrics, 12, 55-67. https://doi.org/10.1080/00401706.1970.10488634
  9. Jhun M and Choi I (2009). Bootstrapping least distance estimator in the multivariate regression model, Computational Statistics and Data Analysis, 53, 4221-4227. https://doi.org/10.1016/j.csda.2009.05.012
  10. Simila T and Tikka J (2006). Common subset selection of inputs in multiresponse regression, In Proceedings of The 2006 IEEE International Joint Conference on Neural Network Proceedings, Vancouver, BC, Canada, 1908-1915.
  11. Tibshirani R (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), 58, 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
  12. Turlach B, Venables WN, and Wright SJ (2005). Simultaneous variable selection, Technometrics, 27, 349-363. https://doi.org/10.1198/004017005000000139
  13. Wang H and Leng C (2008). A note on adaptive group lasso, Computational Statistics and Data Analysis, 52, 5277-5286. https://doi.org/10.1016/j.csda.2008.05.006
  14. Yeh IC (2009). Simulation of concrete slump using neural networks, Proceedings of the Institution of Civil Engineers-Construction Materials, 162, 11-18. https://doi.org/10.1680/coma.2009.162.1.11
  15. Yuan M and Lin Y (2006). Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68, 49-67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
  16. Zhang H, Wang J, Sun Z, Zurada JM, and Pal NR (2019). Feature selection for neural networks using group lasso regularization, IEEE Transactions on Knowledge and Data Engineering, 32, 659-673. https://doi.org/10.1109/TKDE.2019.2893266
  17. Zou H (2006). The adaptive lasso and its oracle properties, Journal of the American Statistical Association, 101, 1418-1429. https://doi.org/10.1198/016214506000000735