DOI QR코드

DOI QR Code

L1-회귀추정량의 붕괴점 향상을 위한 알고리즘

Algorithm for the L1-Regression Estimation with High Breakdown Point

  • 김부용 (숙명여자대학교 통계학과)
  • Kim, Bu-Yong (Department of Statistics, Sookmyung Women's University)
  • 투고 : 20100300
  • 심사 : 20100500
  • 발행 : 2010.07.31

초록

$L_1$-회귀추정량이 수직이상점에 대해서는 매우 로버스트하지만 지렛점에 대해서는 전혀 로버스트하지 않다는 사실은 잘 알려져 있다. 본 논문에서는 수직이상점은 물론 지렛점에 대해서도 로버스트한 $L_1$-회귀추정을 위한 알고리즘을 제안한다. MCD 또는 MVE-추정량에 바탕을 둔 로버스트거리를 기준으로 지렛점들을 식별하고, 식별된 지렛점들의 영향력을 적절히 감소시키기 위한 가중치를 결정한다. 가중치에 의해 변환된 자료에 선형척도변환 기법에 바탕을 둔 선형계획 알고리즘을 적용함으로써 $L_1$-회귀추정량의 붕괴점을 향상시킨다. 다양한 형태와 규모의 자료에 대한 모의실험 결과, 제안된 알고리즘에 의한 $L_1$-회귀추정량의 붕괴점이 크게 향상되는 것으로 나타났다.

The $L_1$-regression estimator is susceptible to the leverage points, even though it is highly robust to the vertical outliers. This article is concerned with the improvement of robustness of the $L_1$-estimator. To improve its robustness, in terms of the breakdown point, we attempt to dampen the influence of the leverage points by means of reducing the weights corresponding to the leverage points. In addition the algorithm employs the linear scaling transformation technique, for higher computational efficiency with the large data sets, to solve the linear programming problem of $L_1$-estimation. Monte Carlo simulation results indicate that the proposed algorithm yields $L_1$-estimates which are robust to the leverage points as well as the vertical outliers.

키워드

참고문헌

  1. Armstrong, R. D., Frome, E. L. and Kung, D. S. (1979). A revised simplex algorithm for the absolute deviation curve fitting problem, Communications in Statistics - Simulation and Computation, 8, 175-190. https://doi.org/10.1080/03610917908812113
  2. Barrodale, I. and Roberts, F. D. K. (1973). An improved algorithm for discrete linear approximation, SIAM Journal on Numerical Analysis, 10, 839-848. https://doi.org/10.1137/0710069
  3. Bassett, G. and Koenker, R. (1978). Asymptotic theory of least absolute error regression, Journal of the American Statistical Association, 73, 618-622. https://doi.org/10.2307/2286611
  4. Blattberg, R. and Sargent, T. (1971). Regression with non-Gaussian stable disturbances; some sampling results, Econometrica, 39, 501-510. https://doi.org/10.2307/1913262
  5. Bloomfield, P. and Steiger, W. (1980). Least absolute deviations curve-fitting, SIAM Journal on Scientific Computing, 1, 290-301. https://doi.org/10.1137/0901019
  6. Chen, X. R. and Wu, Y. (1993). On a necessary condition for the consistency of the $L_1$-estimates in linear regression models, Communications in Statistics - Theory and Methods, 22, 631-639. https://doi.org/10.1080/03610929308831043
  7. Coleman, T. F. and Li, Y. (1992). A globally and quadratically convergent affine scaling method for linear problems, Mathematical Programming, 56, 189-222. https://doi.org/10.1007/BF01580899
  8. Dielman, T. E. (2005). Least absolute value regression: recent contributions, Journal of Statistical Computation and Simulation, 75, 263-286. https://doi.org/10.1080/0094965042000223680
  9. Dielman, T. E. and Pfaffenberger, R. (1982). LAV estimation in linear regression; a review, TIMS/Studies in the Management Sciences, 19, 31-52.
  10. Dielman, T. E. and Pfaffenberger, R. (1992). A further comparison of tests of hypothesis in LAV regression, Computational Statistics & Data Analysis, 14, 375-384. https://doi.org/10.1016/0167-9473(92)90046-I
  11. Gentle, J. E., Narula, S. C. and Sposito, V. A. (1987). Algorithms for unconstrained $L_1$ linear regression, In Statistical Data Analysis based on the $L_1$-norm and Related Methods, edited by Y. Dodge, North-Holland, 83-94.
  12. Hadi, A. S. (1994). A modification of a method for the detection of outliers in multivariate samples, Journal of the Royal Statistical Society, 56, 393-396.
  13. Hardin, J. and Rocke, D. M. (2004). Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator, Computational Statistics & Data Analysis, 44, 625-638. https://doi.org/10.1016/S0167-9473(02)00280-3
  14. Kim, B. Y. (1995). On the robustness of $L_1$-estimator in linear regression models, The Korean Communications in Statistics, 2, 277-287.
  15. Kim, B. Y. (2004). Resampling-based hypothesis test in $L_1$-regression, The Korean Communications in Statistics, 11, 643-655. https://doi.org/10.5351/CKSS.2004.11.3.643
  16. Koenker, R. (1987). A comparison of asymptotic testing methods for $L_1$-regression, In Statistical Data Analysis based on the $L_1$-norm and Related Methods, ed. by Y. Dodge. 287-298.
  17. Montgomery, D. C., Peck, E. A. and Vining, G. G. (2006). Introduction to Linear Regression Analysis, John Wiley & Sons, New Jersey.
  18. Pfaffenberger, R. C. and Dinkel, J. J. (1978). Absolute deviations curve fitting; An alternative to least squares, In Contributions to Survey Sampling and Applied Statistics, edited by H. A. David, Academic Press, New York, 279-294.
  19. Rosenberg, B. and Carson, D. (1977). A simple approximation of the sampling distribution of least absolute residuals regression estimates, Communications in Statistics - Simulation and Computation, 6, 421-437. https://doi.org/10.1080/03610917708812055
  20. Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point, Mathematical Statistics and Applications, B, ed. by W. Grossmann, G. Pflug, I. Vincze, and W. Werz.
  21. Rousseeuw, P. J. and Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41, 212-223. https://doi.org/10.2307/1270566
  22. Sherali, H., Skarpness, B. and Kim, B. Y. (1988). An assumption-free convergence analysis for a perturbation of the scaling algorithm for linear programs, with application to the $L_1$-estimation problem, Naval Research Logistics, 35, 473-492. https://doi.org/10.1002/1520-6750(198808)35:4<473::AID-NAV3220350403>3.0.CO;2-C
  23. Woodruff, D. L. and Rocke, D. M. (1994). Computable robust estimation of multivariate location and shape in high dimension using compound estimators, Journal of the American Statistical Association, 89, 888-896. https://doi.org/10.2307/2290913