DOI QR코드

DOI QR Code

Robust Response Transformation Using Outlier Detection in Regression Model

회귀모형에서 이상치 검색을 이용한 로버스트 변수변환방법

  • Seo, Han-Son (Department of Applied Statistics, Konkuk University) ;
  • Lee, Ga-Yoen (Strategy & Planning Team, Okcashbag Service) ;
  • Yoon, Min (Department of Statistics, Pukyong National University)
  • 서한손 (건국대학교 응용통계학과) ;
  • 이가연 (오케이캐시백 서비스 전략기획팀) ;
  • 윤민 (부경대학교 통계학과)
  • Received : 20111000
  • Accepted : 20111100
  • Published : 2012.02.29

Abstract

Transforming response variable is a general tool to adapt data to a linear regression model. However, it is well known that response transformations in linear regression are very sensitive to one or a few outliers. Many methods have been suggested to develop transformations that will not be influenced by potential outliers. Recently Cheng (2005) suggested to using a trimmed likelihood estimator based on the idea of the least trimmed squares estimator(LTS). However, the method requires presetting the number of outliers and needs many computations. A new method is proposed, that can solve the problems addressed and improve the robustness of the estimates. The method uses a stepwise procedure, suggested by Hadi and Simonoff (1993), to detect outliers that determine response transformations.

선형회귀모형에서 자료를 모형에 적합시킬 때 일반적으로 반응변수 변환을 시도하지만 적절한 변환함수의 결정은 몇개의 이상치들에 민감하게 반응한다는 것이 잘 알려져 있다. 이에 따라 이상치에 영향을 받지 않는 변수변환 방법들이 연구, 개발되고 있으나 최근에 Cheng (2005)에 의해 최소절사제곱추정치에 기반을 둔 절사 우도추정치 방법처럼 이상치의 숫자를 미리 정해야한다거나 많은 계산량이 필요하다는 단점들을 갖고 있다. 본 논문에서는 그와 같은 문제점을 해결하고 추정치의 강건성을 개선하는 새로운 방법을 제안하며 제안된 방법에서는 반응변수 변환에 따른 이상치 탐색법에 있어서 Hadi와 Simonoff (1993)가 제시한 단계적 절차를 응용, 적용한다.

Keywords

References

  1. Atkinson, A. C. (1985). Plots, Transformations and Regression: An Introduction to Graphical Method of Diagnostic Regression Analysis, Oxford University Press, Oxford.
  2. Atkinson, A. C. (1986). Aspects of diagnostic regression analysis (discussion of influential observations, high leverage points, and outliers in linear regression), Statistical Science, 1, 397-402. https://doi.org/10.1214/ss/1177013624
  3. Atkinson, A. C. (1988). Transformations unmasked, Technometrics, 30, 311-318. https://doi.org/10.2307/1270085
  4. Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations (with discussion), Journal of the Royal Statistical Society. Series B (Methodological), 26, 211-246.
  5. Cheng, T.-C. (2005). Robust regression diagnostics with data transformations, Computational Statistics & Data Analysis, 49, 875-891. https://doi.org/10.1016/j.csda.2004.06.010
  6. Cook, R. D. and Wang, P. C. (1983). Transformations and influential cases in regression, Technometrics, 25, 337-343. https://doi.org/10.2307/1267855
  7. Gentleman, J. F. and Wilk, M. B. (1975). Detecting outliers. II. Supplementing The direct analysis of residuals, Biometrics, 31, 387-410. https://doi.org/10.2307/2529428
  8. Hadi, A. S. and Luceno, A. (1997). Maximum trimmed likelihood estimators: A unified approach, examples, and algorithms, Computational Statistics & Data Analysis, 25, 251-272. https://doi.org/10.1016/S0167-9473(97)00011-X
  9. Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88, 1264-1272. https://doi.org/10.2307/2291266
  10. Hinkley, D. V. and Wang, S. (1988). More about transformations and influential cases in regression, Technometrics, 30, 435-440. https://doi.org/10.2307/1269807
  11. Kianifard, F. and Swallow, W. H. (1989). Using recursive residuals, calculated on adaptively-ordered observations, to identify outliers in linear regression, Biometrics, 45, 571-585. https://doi.org/10.2307/2531498
  12. Marasinghe, M. G. (1985). A multistage procedure for detecting several outliers in linear regression, Technometrics, 27, 395-399. https://doi.org/10.2307/1270206
  13. Paul, S. R. and Fung, K. Y. (1991). A generalized extreme studentized residual multiple-outlier-detection procedure in linear regression, Technometrics, 33, 339-348. https://doi.org/10.2307/1268785
  14. Rousseeuw, P. J. (1984). Least median of squares regression, Journal of the American Statistical Association, 79, 871-880. https://doi.org/10.2307/2288718
  15. Rousseeuw, P. J. and Driessen, K. V. (2006). Computing LTS regression for large data sets, Data Mining and Knowledge Discovery, 12, 29-45. https://doi.org/10.1007/s10618-005-0024-4
  16. Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection, John Wiley, New York.
  17. Tsai, C. L. and Wu, X. (1990). Diagnostics in transformation and weighted regression, Technometrics, 32, 315-322. https://doi.org/10.2307/1269108