DOI QR코드

DOI QR Code

A Criterion for the Selection of Principal Components in the Robust Principal Component Regression

로버스트주성분회귀에서 최적의 주성분선정을 위한 기준

  • Kim, Bu-Yong (Department of Statistics, Sookmyung Women's University)
  • 김부용 (숙명여자대학교 통계학과)
  • Received : 20110800
  • Accepted : 20111000
  • Published : 2011.11.30

Abstract

Robust principal components regression is suggested to deal with both the multicollinearity and outlier problem. A main aspect of the robust principal components regression is the selection of an optimal set of principal components. Instead of the eigenvalue of the sample covariance matrix, a selection criterion is developed based on the condition index of the minimum volume ellipsoid estimator which is highly robust against leverage points. In addition, the least trimmed squares estimation is employed to cope with regression outliers. Monte Carlo simulation results indicate that the proposed criterion is superior to existing ones.

회귀모형에 연관성이 높은 설명변수들이 포함되면 다중공선성의 문제가 야기되며, 동시에 자료에 회귀 이상점들이 포함되면 최소자승추정량에 바탕을 둔 제반 통계적 추론은 심각한 결함을 갖게 된다. 이러한 현상들은 데이터마이닝 분야에서 많이 볼 수 있는데, 본 논문에서는 두 가지 문제를 동시에 해결하기 위한 방안으로서 로버스트주성분회귀를 제안하였다. 특히 최적의 주성분을 선정하기 위한 새로운 기준을 개발하였는데, 설명변수들의 표본공분산 대신에 MVE-추정량을 기반으로 하였으며, 고유치가 아니라 상태지수의 크기에 바탕을 둔 선정기준을 제안하였다. 그리고 주성분모형에서의 추정을 위하여 회귀이상점에 대해 로버스트한 LTS-추정을 도입하였다. 제안된 선정기준이 기존의 기준들보다 다중공선성과 이상점이 유발하는 문제들을 잘 해결할 수 있음을 모의실험을 통하여 확인하였다.

Keywords

References

  1. 김부용, 신명희 (2010). 주성분회귀분석에서 주성분선정을 위한 새로운 방법, <응용통계연구>, 23, 967-975. https://doi.org/10.5351/KJAS.2010.23.5.967
  2. Fauconnier, C. and Haesbroeck, G. (2009). Outliers detections with the minimum covariance determinant estimator in practice, Statistical Methodology, 6, 363-379. https://doi.org/10.1016/j.stamet.2008.12.005
  3. Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88, 1264-1272. https://doi.org/10.2307/2291266
  4. Hubert, M. and Verboven, S. (2003). A robust PCR method for high-dimensional regressors, Journal of Chemometrics, 17, 438-452. https://doi.org/10.1002/cem.783
  5. Jolliffe, I. T. (1972). Discarding variables in a principal component analysis. I: artificial data, Applied Statistics, 21, 160-173. https://doi.org/10.2307/2346488
  6. Karlis, D., Saporta, G. and Spinakis, A. (2003). A simple rule for the selection of principal components, Communications in Statistics-Theory and Methods, 32, 643-666. https://doi.org/10.1081/STA-120018556
  7. Kim, B. Y. and Kim, H. Y. (2002). Hybrid algorithm for identification of regression outliers, The Korean Communications in Statistics, 9, 291-304. https://doi.org/10.5351/CKSS.2002.9.1.291
  8. Kim, B. Y. and Oh, M. H. (2004). Identification of regression outliers based on clustering of LMS-residual plots, The Korean Communications in Statistics, 11, 485-494. https://doi.org/10.5351/CKSS.2004.11.3.485
  9. Legendre, P. and Legendre, L. (1998). Numerical Ecology, Elsevier Science, Amsterdam.
  10. Marden, J. I. (1999). Some robust estimates of principal components, Statistics & Probability Letters, 43, 349-359. https://doi.org/10.1016/S0167-7152(98)00272-7
  11. Marquardt, D. W. (1970). Generalized inverse, ridge regression, biased linear estimation, and nonlinear estimation, Technometrics, 12, 591-612. https://doi.org/10.2307/1267205
  12. Mason, R. L. and Gunst, R. F. (1985). Outlier-induced collinearities, Technometrics, 27, 401-407. https://doi.org/10.2307/1270207
  13. McKean, J. W., Sheather, S. J. and Hettmansperger, T. P. (1993). The use and interpretation of residuals based on robust estimation, Journal of the American Statistical Association, 88, 1254-1263. https://doi.org/10.2307/2291265
  14. Pidot, Jr., G. B. (1969). A principal components of the determinants of local government fiscal patterns, The Review of Economics and Statistics, 51, 176-188. https://doi.org/10.2307/1926727
  15. Rocke, D. M. and Woodruff, D. L. (1997). Robust estimation of multivariate location and shape, Journal of Statistical Planning and Inference, 57, 245-255. https://doi.org/10.1016/S0378-3758(96)00047-X
  16. Rousseeuw, P. J. (1984). Least median of squares regression, Journal of the American Statistical Association, 79, 871-880. https://doi.org/10.2307/2288718
  17. Rousseeuw, P. J. and Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41, 212-223. https://doi.org/10.2307/1270566
  18. Rousseeuw, P. J. and Driessen, K. (2006). Computing LTS regression for large data sets, Data Mining and Knowledge Discovery, 12, 29-45. https://doi.org/10.1007/s10618-005-0024-4
  19. Rousseeuw, P. J. and Leroy, A. M. (2003). Robust Regression and Outlier Detection, Wiley-Interscience.
  20. Rousseeuw, P. J. and Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points, Journal of the American Statistical Association, 85, 633-639. https://doi.org/10.2307/2289995
  21. Woodruff, D. L. and Rocke, D. M. (1994). Computable robust estimation of multivariate location and shape in high dimension using compound estimators, Journal of the American Statistical Association, 89, 888-896. https://doi.org/10.2307/2290913