DOI QR코드

DOI QR Code

A comparison study of various robust regression estimators using simulation

시뮬레이션을 통한 다양한 로버스트 회귀추정량의 비교 연구

  • Jang, Soohee (Department of Statistics & Information, Dongduk Women's University) ;
  • Yoon, Jungyeon (Korea Banking Institute) ;
  • Chun, Heuiju (Department of Statistics & Information, Dongduk Women's University)
  • 장수희 (동덕여자대학교 정보통계학과) ;
  • 윤정연 (한국금융연수원) ;
  • 전희주 (동덕여자대학교 정보통계학과)
  • Received : 2016.02.15
  • Accepted : 2016.04.03
  • Published : 2016.04.30

Abstract

Least squares (LS) regression is a classic method for regression that is optimal under assumptions of regression and usual observations. However, the presence of unusual data in the LS method leads to seriously distorted estimates. Therefore, various robust estimation methods are proposed to circumvent the limitations of traditional LS regression. Among these, there are M-estimators based on maximum likelihood estimation (MLE), L-estimators based on linear combinations of order statistics and R-estimators based on a linear combinations of the ordered residuals. In this paper, robust regression estimators with high breakdown point and/or with high efficiency are compared under several simulated situations. The paper analyses and compares distributions of estimates as well as relative efficiencies calculated from mean squared errors (MSE) in the simulation study. We conclude that MM-estimators or GR-estimators are a good choice for the real data application.

회귀모형의 대표적인 추정법인 최소제곱법은 오차항의 분포가 정규분포를 따르고 이상치가 없는 상황에서는 최적이지만, 자료가 회귀모형의 가정을 만족하지 않을 경우 또는 이상치를 포함하는 경우와 같이 자료가 오염된 상황에서는 왜곡된 추정 결과를 준다. 따라서 이상치에 민감한 최소제곱법의 단점을 보완하기 위해 다양한 로버스트 추정방법이 제안되었다. 본 논문에서는 MLE를 기반으로 제안된 M 추정량, 순서형 통계량을 기반으로 제안된 L 추정량, 잔차의 순위를 기반으로 제안된 R 추정량 계열에서 높은 붕괴점 또는 높은 효율을 갖는 대표적인 추정량들을 다양한 모의실험을 통해 비교 연구하였다. 추정량의 성능을 비교하는데 효율성 뿐만 아니라 편의, 분산을 포함한 분포를 살펴보았다. 그 결과 실제 데이터 적용에는 MM 추정량과 GR 추정량이 좋은 성능을 가진 것으로 보였다.

Keywords

References

  1. Bellio, R. and Ventura, L. (2005). An introduction to robust estimation with R functions, In Proceedings of the 1st International Workshop on Robust Statistics and R, Treviso, Department of Statistics, Ca'Foscari University, Italy.
  2. Chang, W. H., Mckean, J. W., Naranjo, J. D., and Sheather, S. J. (1999). High-Breakdown rank regression, Journal of the American Statistical Association, 94, 205-219. https://doi.org/10.1080/01621459.1999.10473836
  3. Croux, C., Rousseeuw, P. J., and Hossjer. O. (1994). Generalized S-estimators, Journal of the American Statistical Association, 89, 1271-1281. https://doi.org/10.1080/01621459.1994.10476867
  4. Hettmansperger, T. P., McKean, J. W., and Sheather, S. J. (1997). Rank-based analyses of linear models, Handbook of Statistics, G.S. Maddala and C.R. Rao eds., Elsevier, 145-173.
  5. Huber, P. J. (1973). Robust regression: asymptotics, conjectures and Monte Carlo, Annals of Statistics, 1, 799-821. https://doi.org/10.1214/aos/1176342503
  6. Jaeckel, L. A. (1972). Estimating regression coefficients by minimizing the dispersion of residuals, Annals of Mathematical Statistics, 43, 1449-1458. https://doi.org/10.1214/aoms/1177692377
  7. Mosteller, F. and Tukey, J. W. (1977). Data Analysis and Regression, A Second Course in Statistics, Addison-Wesley, MA.
  8. Naranjo, J. D. and Hettmansperger, T. P. (1994). Bounded-influence rank regression, Journal of the Royal Statistical Society, Series B, 56, 209-220.
  9. Rousseeuw, P. J. (1984). Least median of squares regression, Journal of the American Statistical Association, 79, 871-880. https://doi.org/10.1080/01621459.1984.10477105
  10. Rousseeuw, P. J. (1985). Multivariate Estimation with High Breakdown Point, 283-297 in Mathematical Statistics and Applications, Vol. B, edited by W. Grossman, G. Pflug, I. Vince, and W. Wetz. Dordrecht: Reidel.
  11. Rousseeuw, P. J. and Croux, C. (1993). Alternatives to the median absolute deviation, Journal of the American Statistical Association, 88, 1273-1283. https://doi.org/10.1080/01621459.1993.10476408
  12. Rousseeuw, P. J. and Yohai, V. (1984). Robust Regression by Means of S-Estimators, Nonlinear Time Series Analysis, Lecture Notes in Statistics, 26, 256-272.
  13. Siegel, A. F. (1982). Robust regression using repeated medians, Biometrika, 69, 242-244. https://doi.org/10.1093/biomet/69.1.242
  14. Stromberg, A. J., Hossjer, O., and Hawkins, D. M. (2000). The least trimmed differences regression estimator and alternatives, Journal of the American Statistical Association, 95, 853-864. https://doi.org/10.1080/01621459.2000.10474277
  15. Yohai, V. J. (1987). High breakdown point and high efficiency robust estimates for regression, Annals of Statistics, 15, 642-656. https://doi.org/10.1214/aos/1176350366
  16. Yu, C., Yao, W., and Bai, X. (2014). Robust linear regression: a review and comparison, (Working Paper), Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.