• 제목/요약/키워드: Outliers detection

검색결과 178건 처리시간 0.023초

A robust method for response variable transformations using dynamic plots

  • Seo, Han Son
    • Communications for Statistical Applications and Methods
    • /
    • 제26권5호
    • /
    • pp.463-471
    • /
    • 2019
  • The variable transformations are useful ways to guarantee the functional relationships in the model. However, the presence of outliers may undermine the accuracy of transformation. This paper deals with response transformations in the partial linear models under the existence of outliers. A new procedure for response transformation and outliers detection is proposed. The procedure uses a sequential method for identifying outliers and dynamic graphical methods for an appropriate transformation. The graphical tools make it possible to catch diagnostic information by monitoring the movement of points in the data. The procedure is illustrated with several examples. Examples show that visual clues regarding the optimal transformation, the fittness of the model and the outlyness of the observations can be checked from the series of plots.

Least quantile squares method for the detection of outliers

  • Seo, Han Son;Yoon, Min
    • Communications for Statistical Applications and Methods
    • /
    • 제28권1호
    • /
    • pp.81-88
    • /
    • 2021
  • k-least quantile of squares (k-LQS) estimates are a generalization of least median of squares (LMS) estimates. They have not been used as much as LMS because their breakdown points become small as k increases. But if the size of outliers is assumed to be fixed LQS estimates yield a good fit to the majority of data and residuals calculated from LQS estimates can be a reliable tool to detect outliers. We propose to use LQS estimates for separating a clean set from the data in the context of outlyingness of the cases. Three procedures are suggested for the identification of outliers using LQS estimates. Examples are provided to illustrate the methods. A Monte Carlo study show that proposed methods are effective.

A Score test for Detection of Outliers in Nonlinear Regression

  • Kahng, Myung-Wook
    • Journal of the Korean Statistical Society
    • /
    • 제22권2호
    • /
    • pp.201-208
    • /
    • 1993
  • Given the specific mean shift outlier model, the score test for multiple outliers in nonlinear regression is discussed as an alternative to the likelihood ratio test. The geometric interpretation of the score statistic is also presented.

  • PDF

OUTLIER DETECTION BASED ON A CHANGE OF LIKELIHOOD

  • Kim, Myung-Geun
    • Journal of applied mathematics & informatics
    • /
    • 제26권5_6호
    • /
    • pp.1133-1138
    • /
    • 2008
  • A general method of detecting outliers based on a change of likelihood by using the influence function is suggested. It can be applied to all kinds of distributions that are specified by parameters. For the multivariate normal case, specific computations are made to get the corresponding conditional influence function. A numerical example is provided for illustration.

  • PDF

Variable Selection and Outlier Detection for Automated K-means Clustering

  • Kim, Sung-Soo
    • Communications for Statistical Applications and Methods
    • /
    • 제22권1호
    • /
    • pp.55-67
    • /
    • 2015
  • An important problem in cluster analysis is the selection of variables that define cluster structure that also eliminate noisy variables that mask cluster structure; in addition, outlier detection is a fundamental task for cluster analysis. Here we provide an automated K-means clustering process combined with variable selection and outlier identification. The Automated K-means clustering procedure consists of three processes: (i) automatically calculating the cluster number and initial cluster center whenever a new variable is added, (ii) identifying outliers for each cluster depending on used variables, (iii) selecting variables defining cluster structure in a forward manner. To select variables, we applied VS-KM (variable-selection heuristic for K-means clustering) procedure (Brusco and Cradit, 2001). To identify outliers, we used a hybrid approach combining a clustering based approach and distance based approach. Simulation results indicate that the proposed automated K-means clustering procedure is effective to select variables and identify outliers. The implemented R program can be obtained at http://www.knou.ac.kr/~sskim/SVOKmeans.r.

Outlier Detection Based on Discrete Wavelet Transform with Application to Saudi Stock Market Closed Price Series

  • RASHEDI, Khudhayr A.;ISMAIL, Mohd T.;WADI, S. Al;SERROUKH, Abdeslam
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제7권12호
    • /
    • pp.1-10
    • /
    • 2020
  • This study investigates the problem of outlier detection based on discrete wavelet transform in the context of time series data where the identification and treatment of outliers constitute an important component. An outlier is defined as a data point that deviates so much from the rest of observations within a data sample. In this work we focus on the application of the traditional method suggested by Tukey (1977) for detecting outliers in the closed price series of the Saudi Arabia stock market (Tadawul) between Oct. 2011 and Dec. 2019. The method is applied to the details obtained from the MODWT (Maximal-Overlap Discrete Wavelet Transform) of the original series. The result show that the suggested methodology was successful in detecting all of the outliers in the series. The findings of this study suggest that we can model and forecast the volatility of returns from the reconstructed series without outliers using GARCH models. The estimated GARCH volatility model was compared to other asymmetric GARCH models using standard forecast error metrics. It is found that the performance of the standard GARCH model were as good as that of the gjrGARCH model over the out-of-sample forecasts for returns among other GARCH specifications.

Comparison of parameter estimation methods for time series models in the presence of outliers

  • 조신섭;이재준;김수화
    • 응용통계연구
    • /
    • 제5권2호
    • /
    • pp.255-268
    • /
    • 1992
  • 본 논문에서는 이상점이 포함된 시계열 자료의 모수추정법으로 반복보간추정법을 제안하였 다. 제안된 방법은 이상점이 더 이상 탐지되지 않을 때까지 모수추정의 단계와 이상점의 탐 지 단계를 반복하는 접근 방법이다. 이상점의 탐지를 위해서는 비정상적인 자료를 보가추정 법으로 대치하는 보간 검진기법을 적용하였다. 또한 추정과정에서 비정상적인 자료의 비중 을 적게하는 대신에 비정상적인 자료를 시계열모형의 구조를 이용한 1-시점후의 예측값으 로 대치하는 수정된 GM-추정법을 제안하였다. 모의실험에 의해 제안된 추정법들과 기존의 로버스트추정법들의 성질을 비교하였다. 모의실험의 결과 반복보간추정법이 다른 추정법보 다 우월한 성질을 가짐을 알 수 있었으며, 특히 AO가 하나만 있는 경우와 모수의 절대값이 큰 경우에 가장 우수함을 확인 할 수 있었다.

  • PDF

영상 쌍에서 회귀분석에 기초한 이상 물체 검출: 잡음분산의 추정과 성능 분석 (Outlier-Object Detection Using an Image Pair Based on Regression Analysis: Noise Variance Estimation and Performance Analysis)

  • 김동식
    • 대한전자공학회논문지SP
    • /
    • 제45권5호
    • /
    • pp.25-34
    • /
    • 2008
  • 동일한 위치에서 같은 장면을 담고 있지만 서로 다른 시간에 획득된 두 장의 영상을 서로 비교하여 움직이는 자동차등에 의한 겹침과 같은 이상점의 집합을 검출할 수 있다. 영상들의 서로 다른 밝기 특성에 의한 영향을 줄이기 위하여 다항식 회귀 모델에 근거한 밝기 보정을 하였다. 이상점 집합으로 인한 영향을 약화시키면서 정확한 이상점 검출을 위하여 회귀분석을 단순히 반복하는 알고리듬을 도입하였다. 본 논문에서는 회귀분석을 반복하는 알고리듬의 성능을 잡음분산의 추정의 수렴 특성을 관찰하므로 분석하였으며, 교정 상수를 잡음분산 추정에 사용하여 강인한 검출이 가능하도록 하였다. 합성 영상과 실제 영상에 검출 알고리듬을 실험하여 그 강인성을 보였다.

Support Vector Regression을 이용한 이상치 데이터분석 (An Outlier Data Analysis using Support Vector Regression)

  • 전성해
    • 한국지능시스템학회논문지
    • /
    • 제18권6호
    • /
    • pp.876-880
    • /
    • 2008
  • 주어진 데이터에서 대부분의 다른 관측치들에 비해 지나치게 크거나 작은 관측치를 이상치라고 한다. 이상치는 몇 가지 원인에 의해 발생한다. 이상치를 포함한 데이터의 분석결과는 이 값을 포함하지 않은 경우와 크게 달라질 수 있다. 일반적으로 이상치는 탐지를 통하여 찾아내어 제거한 후에 데이터분석을 수행한다. 하지만 사기탐지, 네트워크 침입 등의 데이터 마이닝 분야에서는 이상치가 중요한 정보를 포함하고 있기 때문에 반드시 포함하여 데이터분석을 수행하여야 한다. 본 논문에서 다루는 회귀모형에서는 기존의 단순, 다중 회귀분석은 이상치에 대하여 안정된 모형을 구축하기 어렵기 때문에 표준화 잔차 또는 스튜던트화된 잔차를 이용하여 이상치를 찾아내고 제거한 후의 데이터분석 수행을 추천한다. 본 논문에서는 회귀모형에서 이상치를 포함하여 효과적으로 데이터분석을 수행할 수 있는 한 방법으로 Vapnik이 제안한 통계적 학습이론에 기반한 Support Vector Regression(SVR)을 이용하였다 인공 데이터를 생성한 모의실험 결과 기존의 회귀모형에 비해 SVR의 향상된 결과를 확인할 수 있었다.

A Novel Battery State of Health Estimation Method Based on Outlier Detection Algorithm

  • Piao, Chang-hao;Hu, Zi-hao;Su, Ling;Zhao, Jian-fei
    • Journal of Electrical Engineering and Technology
    • /
    • 제11권6호
    • /
    • pp.1802-1811
    • /
    • 2016
  • A novel battery SOH estimation algorithm based on outlier detection has been presented. The Battery state of health (SOH) is one of the most important parameters that describes the usability state of the power battery system. Firstly, a battery system model with lifetime fading characteristic was established, and the battery characteristic parameters were acquired from the lifetime fading process. Then, the outlier detection method based on angular distribution was used to identify the outliers among the battery behaviors. Lastly, the functional relationship between battery SOH and the outlier distribution was obtained by polynomial fitting method. The experimental results show that the algorithm can identify the outliers accurately, and the absolute error between the SOH estimation value and true value is less than 3%.