Browse > Article
http://dx.doi.org/10.5351/CKSS.2004.11.3.485

Identification of Regression Outliers Based on Clustering of LMS-residual Plots  

Kim, Bu-Yong (Department of Statistics, Sookmyung Women’s University)
Oh, Mi-Hyun (Department of Statistics, Sookmyung Women’s University)
Publication Information
Communications for Statistical Applications and Methods / v.11, no.3, 2004 , pp. 485-494 More about this Journal
Abstract
An algorithm is proposed to identify multiple outliers in linear regression. It is based on the clustering of residuals from the least median of squares estimation. A cut-height criterion for the hierarchical cluster tree is suggested, which yields the optimal clustering of the regression outliers. Comparisons of the effectiveness of the procedures are performed on the basis of the classic data and artificial data sets, and it is shown that the proposed algorithm is superior to the one that is based on the least squares estimation. In particular, the algorithm deals very well with the masking and swamping effects while the other does not.
Keywords
regression outlier; robust residual; clustering; masking; swamping;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Hawkins, D. M., Bradu, D. and Kass, G. V.(1984). Location of several outliers in multiple regression data using elemental sets, Technometrics, Vol. 26, 197-208   DOI   ScienceOn
2 Kianifard, F. and Swallow, W. H.(1990). A Monte Carlo comparison of five procedures for identifying outliers in linear regression, Commun. Statist.-Theory Meth, Vol. 19, 1913-1938   DOI
3 Kim, B. Y.(1996).$ L_{\infty}$-estimation based algorithm for the least median of squares estimator, The Korean Communications in Statistics, Vol. 3, 299-307
4 Kim, B. Y. and Kim, H. Y(2002). A hybrid algorithm for identifying multiple outliers in linear regression, The Korean Communication in Statistics, Vol. 9, 291-304   DOI   ScienceOn
5 Marasinghe, M. G.(1985). A multistage procedure for detecting several outliers in linear regression, Technometrics, Vol. 27, 395-399   DOI   ScienceOn
6 Mojena, R(1977). Hierarchical grouping methods and stopping rules: an evaluation, Computer journal, Vol. 20, 359-363   DOI
7 Rousseeuw, P. J.(1984). Least median of squares regression, journal of the American Statistical Association, Vol. 79, 871-880   DOI   ScienceOn
8 Rousseeuw, P. J. and Leroy, A M.(1987). Robust Regression and Outlier Detection, Wiley-Interscience, New York
9 Rousseeuw, P. J. and Zomeren, B. C.(1990). Unmasking multivariate outliers and leverage points, journal of the American Statistical Association, Vol. 85, 633-639   DOI   ScienceOn
10 Sebert, D. M., Montgomery, D. C. and RoIlier, D. A(1998). A clustering algorithm for identifying multiple outliers in linear regression, Computational Statistics & Data Analysis, Vol. 27, 461-484   DOI   ScienceOn
11 Everitt, B. S.(1993). Cluster Analysis, Halsted Press, New York
12 Basset, Jr. G. W.(1991). Equivariant, monotonic, 50% breakdown estimators, The American Statistician, Vol. 45, 135-137   DOI   ScienceOn
13 Belsely, D. A, Kuh, E. and Welsh, R E.(1980). Regression Diagnostics: lrifluential Data and Source of Collinearity. Wiley, New York
14 Cook, R D. and Weisberg, S.(1980). Characterizations of an empirical influence function for detecting influential cases in regression, Technometrics, Vol. 22, 495-508   DOI   ScienceOn
15 Hadi, A S. and Simonoff, J. S.(1993). Procedures for the identification of multiple outliers in linear models, journal of the American Statistical Association, Vol. 88, 1264-1272   DOI   ScienceOn
16 Hartigan, J. A(1975). Clustering Algorithms, Wiley, New York