• Title/Summary/Keyword: Outliers

Search Result 655, Processing Time 0.027 seconds

Identifying Multiple Leverage Points ad Outliers in Multivariate Linear Models

  • Yoo, Jong-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.3
    • /
    • pp.667-676
    • /
    • 2000
  • This paper focuses on the problem of detecting multiple leverage points and outliers in multivariate linear models. It is well known that he identification of these points is affected by masking and swamping effects. To identify them, Rousseeuw(1985) used robust estimators of MVE(Minimum Volume Ellipsoids), which have the breakdown point of 50% approximately. And Rousseeuw and van Zomeren(1990) suggested the robust distance based on MVE, however, of which the computation is extremely difficult when the number of observations n is large. In this study, e propose a new algorithm to reduce the computational difficulty of MVE. The proposed method is powerful in identifying multiple leverage points and outlies and also effective in reducing the computational difficulty of MVE.

  • PDF

Modified Multivariate $T^2$-Chart based on Robust Estimation (로버스트 추정에 근거한 수정된 다변량 $T^2$- 관리도)

  • 성웅현;박동련
    • Journal of Korean Society for Quality Management
    • /
    • v.29 no.1
    • /
    • pp.1-10
    • /
    • 2001
  • We consider the problem of detecting special variations in multivariate $T^2$-control chart when two or more multivariate outliers are present. Since a multivariate outlier may reflect slippage in mean, variance, or correlation, it can distort the sample mean vector and sample covariance matrix. Damaged sample mean vector and sample covariance matrix have difficulty in examining special variations clearly, An alternative to detection outliers or special variations is to use robust estimators of mean vector and covariance matrix that are less sensitive to extreme observations than are the standard estimators $\bar{x}$ and $\textbf{S}$. We applied popular minimum volume ellipsoid(MVE) and minimum covariance determinant(MCD) method to estimate mean vector and covariance matrix and compared its results with standard $T^2$-control chart using simulated multivariate data with outliers. We found that the modified $T^2$-control chart based on the above robust methods were more effective in detecting special variations clearly than the standard $T^2$-control chart.

  • PDF

Outlier rejection in automobile-mounted NFOV camera (지능화 차량을 위한 오정합점 제거 방법)

  • Suhr, Jae-Kyu;Bea, Kwang-Hyuk;Jung, Ho-Gi;Kim, Jai-Hie
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.375-376
    • /
    • 2007
  • This paper proposes an algorithm for rejecting mismatched points (known as outliers). The proposed algorithm identifies and rejects outliers in image pairs obtained under automobile-like motions which consist of two translations and one rotation. The camera rotation is approximated to the image shift by assuming that the narrow field of lens is used. The voting method estimates the focus of expansion (FOE) while shifting one of the images. Using the properties of the FOE, the outliers are rejected while most of the inliers are retained.

  • PDF

Identification of Regression Outliers Based on Clustering of LMS-residual Plots

  • Kim, Bu-Yong;Oh, Mi-Hyun
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.3
    • /
    • pp.485-494
    • /
    • 2004
  • An algorithm is proposed to identify multiple outliers in linear regression. It is based on the clustering of residuals from the least median of squares estimation. A cut-height criterion for the hierarchical cluster tree is suggested, which yields the optimal clustering of the regression outliers. Comparisons of the effectiveness of the procedures are performed on the basis of the classic data and artificial data sets, and it is shown that the proposed algorithm is superior to the one that is based on the least squares estimation. In particular, the algorithm deals very well with the masking and swamping effects while the other does not.

Density-based Outlier Detection in Multi-dimensional Datasets

  • Wang, Xite;Cao, Zhixin;Zhan, Rongjuan;Bai, Mei;Ma, Qian;Li, Guanyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3815-3835
    • /
    • 2022
  • Density-based outlier detection is one of the hot issues in data mining. A point is determined as outlier on basis of the density of points near them. The existing density-based detection algorithms have high time complexity, in order to reduce the time complexity, a new outlier detection algorithm DODMD (Density-based Outlier Detection in Multidimensional Datasets) is proposed. Firstly, on the basis of ZH-tree, the concept of micro-cluster is introduced. Each leaf node is regarded as a micro-cluster, and the micro-cluster is calculated to achieve the purpose of batch filtering. In order to obtain n sets of approximate outliers quickly, a greedy method is used to calculate the boundary of LOF and mark the minimum value as LOFmin. Secondly, the outliers can filtered out by LOFmin, the real outliers are calculated, and then the result set is updated to make the boundary closer. Finally, the accuracy and efficiency of DODMD algorithm are verified on real dataset and synthetic dataset respectively.

Efficient Outlier Detection of the Water Temperature Monitoring Data (수온 관측 자료의 효율적인 이상 자료 탐지)

  • Cho, Hongyeon;Jeong, Shin Taek;Ko, Dong Hui;Son, Kyeong-Pyo
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.26 no.5
    • /
    • pp.285-291
    • /
    • 2014
  • The statistical information of the coastal water temperature monitoring data can be biased because of outliers and missing intervals. Though a number of outlier detection methods have been developed, their applications are very limited to the in-situ monitoring data because of the assumptions of the a prior information of the outliers and no-missing condition, and the excessive computational time for some methods. In this study, the practical robust method is developed that can be efficiently and effectively detect the outliers in case of the big-data. This model is composed of these two parts, one part is the construction part of the approximate components of the monitoring data using the robust smoothing and data re-sampling method, and the other part is the main iterative outlier detection part using the detailed components of the data estimated by the approximate components. This model is tested using the two-years 5-minute interval water temperature data in Lake Saemangeum. It can be estimated that the outlier proportion of the data is about 1.6-3.7%. It shows that most of the outliers in the data are detected and removed with satisfaction by the model. In order to effectively detect and remove the outliers, the outlier detection using the long-span smoothing should be applied earlier than that using the short-span smoothing.

More informative sequential probability ratio test (정보 전달이 보다 효과적인 변형된 축차확률비검정)

  • 박노진
    • The Korean Journal of Applied Statistics
    • /
    • v.9 no.2
    • /
    • pp.109-117
    • /
    • 1996
  • We introduce the more informative sequential probability ratio test(SPRT) than currently used SPRT. Though the proposed SPRT shares similar mathematical properties with the ordinary SPRT, it is less affected by the outliers and even it indicates possible existence of such outliers. Futhermore, it responds to the changes among observations more quickly than the ordinary SPRT.

  • PDF

Robust inference for linear regression model based on weighted least squares

  • Park, Jin-Pyo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.2
    • /
    • pp.271-284
    • /
    • 2002
  • In this paper we consider the robust inference for the parameter of linear regression model based on weighted least squares. First we consider the sequential test of multiple outliers. Next we suggest the way to assign a weight to each observation $(x_i,\;y_i)$ and recommend the robust inference for linear model. Finally, to check the performance of confidence interval for the slope using proposed method, we conducted a Monte Carlo simulation and presented some numerical results and examples.

  • PDF

Rejecting Outliers by Maximum Modified Normed Residual

  • Kim, Soon Kwi
    • Journal of Korean Society for Quality Management
    • /
    • v.13 no.2
    • /
    • pp.56-60
    • /
    • 1985
  • One may be particularly interested in identifying which are the genuinely exceptional observations, in order to create a new insight into the phenomena under study. To detect outliers, many statistics have been proposed such as the maximum normed residual (MNR), a statistic equivalent to the maximum normed residual C. Daniel proposed, studentized residual, standardized residual, and so on. This paper gives a procedure for calculating critical values of the maximum modified normed residual and the distribution of the modified normed residual.

  • PDF

Residuals Plots for Repeated Measures Data

  • PARK TAESUNG
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2000.11a
    • /
    • pp.187-191
    • /
    • 2000
  • In the analysis of repeated measurements, multivariate regression models that account for the correlations among the observations from the same subject are widely used. Like the usual univariate regression models, these multivariate regression models also need some model diagnostic procedures. In this paper, we propose a simple graphical method to detect outliers and to investigate the goodness of model fit in repeated measures data. The graphical method is based on the quantile-quantile(Q-Q) plots of the $X^2$ distribution and the standard normal distribution. We also propose diagnostic measures to detect influential observations. The proposed method is illustrated using two examples.

  • PDF