DOI QR코드

DOI QR Code

Outlier Detection Using Dynamic Plots

동적 그림을 이용한 이상치 검색

  • Ahn, Byung-Jin (Department of Applied Statistics, Konkuk University) ;
  • Seo, Han-Son (Department of Applied Statistics, Konkuk University)
  • 안병진 (건국대학교 응용통계학과) ;
  • 서한손 (건국대학교 응용통계학과)
  • Received : 20110800
  • Accepted : 20110900
  • Published : 2011.10.31

Abstract

A linear regression method is commonly used to analyze data because of its simplicity and applicability; however, it is well known that data may contain some outliers and influential cases that may have a harmful effect on a statistical analysis. Thus detection and examination of outliers or influential cases are important parts of data analysis. In detecting multiple outliers, masking effects usually occur and make it difficult to identify the true outliers. We propose to use dynamic plots as a method resistant to masking effect. The procedure using dynamic plots is useful to find appropriate basic sets with which a dependent outliers detection method start and detect a true outliers set. Examples are given to demonstrate the effectiveness of the suggested idea.

선형회귀모형분석은 방법의 간편성과 높은 적용성에 의해 다양한 종류의 자료 분석에 활용되고 있다. 하지만 자료에 이상치가 포함되는 경우 이에 민감하게 영향을 받게 되므로 의심되는 관찰치를 찾아서 이상치 여부를 검토하는 것이 중요하다. 그러나 이상치를 탐지하는 방법의 대부분은 가면화 효과 등 이상치로부터 영향을 받아 정확하게 이상치를 발견하지 못하는 경우가 있다. 본 연구에서는 이를 개선하기 위하여 동적 잔차도를 활용한 방법을 제안한다. 제안된 방법은 종속적 이상치탐지방법을 사용할 때 다양한 기초군을 제공하는데 유용하며 결과적으로 정확한 이상치군을 탐지하게 되는 것을 예를 통해 검증한다.

Keywords

References

  1. Atkinson, A. C. (1994). Fast very robust methods for the detection of multiple outliers, Journal of the American Statistical Association, 89, 1329-1339. https://doi.org/10.2307/2290995
  2. Gentleman, J. F. and Wilk, M. B. (1975). Detecting outliers II: Supplementing the direct analysis of residuals, Biometrics, 31, 387-410. https://doi.org/10.2307/2529428
  3. Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88, 1264-1272. https://doi.org/10.2307/2291266
  4. Jajo, N. K. (2005). A review of Robust regression an diagnostic procedures in linear regression, Acta Mathematicae Applicatae Sinica, 21, 209-224. https://doi.org/10.1007/s10255-005-0230-2
  5. Kianifard, F. and Swallow, W. H. (1989). Using recursive residuals, calculated on adaptive-ordered observations, to identify outliers in linear regression, Biometrics, 45, 571-885. https://doi.org/10.2307/2531498
  6. Kianifard, F. and Swallow, W. H. (1990). A Monte Carlo comparison of five procedures for identifying outliers in linear regression, Communications in Statistics, 19, 1913-1938. https://doi.org/10.1080/03610929008830300
  7. Marasinghe, M. G. (1985). A multistage procedure for detecting several outliers in linear regression, Technometrics, 27, 395-399. https://doi.org/10.2307/1270206
  8. Paul, S. R. and Fung, K. Y. (1991). A Generalized extreme studentized residual multiple-outlier-detection procedure in linear regression, Technometrics, 33, 339-348. https://doi.org/10.2307/1268785
  9. Pena, D. and Yohai, V. J. (1999). A fast procedure for outlier diagnostics in linear regression problems, Journal of the American Statistical Association, 94, 434-445. https://doi.org/10.2307/2670164
  10. Rousseeuw, P. J. and Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points, Journal of the American Statistical Association, 85, 633-639. https://doi.org/10.2307/2289995
  11. Tierney, L. (1990). Lisp-Stat, John Wiley & Sons, New York.

Cited by

  1. Outlier detection using Grubb test and Cochran test in clinical data vol.23, pp.4, 2012, https://doi.org/10.7465/jkdi.2012.23.4.657
  2. Clustering Observations for Detecting Multiple Outliers in Regression Models vol.25, pp.3, 2012, https://doi.org/10.5351/KJAS.2012.25.3.503