• 제목/요약/키워드: Outlier detection methods

검색결과 87건 처리시간 0.025초

Identification of Incorrect Data Labels Using Conditional Outlier Detection

  • Hong, Charmgil
    • 한국멀티미디어학회논문지
    • /
    • 제23권8호
    • /
    • pp.915-926
    • /
    • 2020
  • Outlier detection methods help one to identify unusual instances in data that may correspond to erroneous, exceptional, or surprising events or behaviors. This work studies conditional outlier detection, a special instance of the outlier detection problem, in the context of incorrect data label identification. Unlike conventional (unconditional) outlier detection methods that seek abnormalities across all data attributes, conditional outlier detection assumes data are given in pairs of input (condition) and output (response or label). Accordingly, the goal of conditional outlier detection is to identify incorrect or unusual output assignments considering their input as condition. As a solution to conditional outlier detection, this paper proposes the ratio-based outlier scoring (ROS) approach and its variant. The propose solutions work by adopting conventional outlier scores and are able to apply them to identify conditional outliers in data. Experiments on synthetic and real-world image datasets are conducted to demonstrate the benefits and advantages of the proposed approaches.

Temporal and spatial outlier detection in wireless sensor networks

  • Nguyen, Hoc Thai;Thai, Nguyen Huu
    • ETRI Journal
    • /
    • 제41권4호
    • /
    • pp.437-451
    • /
    • 2019
  • Outlier detection techniques play an important role in enhancing the reliability of data communication in wireless sensor networks (WSNs). Considering the importance of outlier detection in WSNs, many outlier detection techniques have been proposed. Unfortunately, most of these techniques still have some potential limitations, that is, (a) high rate of false positives, (b) high time complexity, and (c) failure to detect outliers online. Moreover, these approaches mainly focus on either temporal outliers or spatial outliers. Therefore, this paper aims to introduce novel algorithms that successfully detect both temporal outliers and spatial outliers. Our contributions are twofold: (i) modifying the Hampel Identifier (HI) algorithm to achieve high accuracy identification rate in temporal outlier detection, (ii) combining the Gaussian process (GP) model and graph-based outlier detection technique to improve the performance of the algorithm in spatial outlier detection. The results demonstrate that our techniques outperform the state-of-the-art methods in terms of accuracy and work well with various data types.

A Comparison of Methods for the Detection of Outliers in Multivariate Data

  • Hadi, Ali-S.;Joo, Hye-Seon;Son, Mun-S.
    • Communications for Statistical Applications and Methods
    • /
    • 제3권2호
    • /
    • pp.53-67
    • /
    • 1996
  • Numerous classical as well as robust methods have been proposed in the literature for the detection of multiple outlier in multivariate data. The effectiveness and power of each of these methods have not been thoroughly investigated. In this paper we first reduce the vast number of outlier detection methods to a small number of viable ones. This reduction is based on previous work of other researches and on some theoretical arguments. Then we design and implement a Monte Carlo experiment for comparing these methods. The main goal of our study is to determine which methods are most powerful in the detection of multiple outlier and in dealing with the masking and swamping problems. The results of the Monte Carlo study indicate that two of the methods seem to hace better performances than the others for the detection of multiple outlier in multivariate data.

  • PDF

First Order Difference-Based Error Variance Estimator in Nonparametric Regression with a Single Outlier

  • Park, Chun-Gun
    • Communications for Statistical Applications and Methods
    • /
    • 제19권3호
    • /
    • pp.333-344
    • /
    • 2012
  • We consider some statistical properties of the first order difference-based error variance estimator in nonparametric regression models with a single outlier. So far under an outlier(s) such difference-based estimators has been rarely discussed. We propose the first order difference-based estimator using the leave-one-out method to detect a single outlier and simulate the outlier detection in a nonparametric regression model with the single outlier. Moreover, the outlier detection works well. The results are promising even in nonparametric regression models with many outliers using some difference based estimators.

TIME-VARIANT OUTLIER DETECTION METHOD ON GEOSENSOR NETWORKS

  • Kim, Dong-Phil;I, Gyeong-Min;Lee, Dong-Gyu;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2008년도 International Symposium on Remote Sensing
    • /
    • pp.410-413
    • /
    • 2008
  • Existing Outlier detections have been widely studied in geosensor networks. Recently, machine learning and data mining have been applied the outlier detection method to build a model that distinguishes outliers based on anchored criterion. However, it is difficult for the existing methods to detect outliers against incoming time-variant data, because outlier detection needs to monitor incoming data and classify irregular attacks. Therefore, in order to solve the problem, we propose a time-variant outlier detection using 2-dimensional grid method based on unanchored criterion. In the paper, outliers using geosensor data was performed to classify efficiently. The proposed method can be utilized applications such as network intrusion detection, stock market analysis, and error data detection in bank account.

  • PDF

Dam Sensor Outlier Detection using Mixed Prediction Model and Supervised Learning

  • Park, Chang-Mok
    • International journal of advanced smart convergence
    • /
    • 제7권1호
    • /
    • pp.24-32
    • /
    • 2018
  • An outlier detection method using mixed prediction model has been described in this paper. The mixed prediction model consists of time-series model and regression model. The parameter estimation of the prediction model was performed using supervised learning and a genetic algorithm is adopted for a learning method. The experiments were performed in artificial and real data set. The prediction performance is compared with the existing prediction methods using artificial data. Outlier detection is conducted using the real sensor measurements in a dam. The validity of the proposed method was shown in the experiments.

Procedures for Detecting Multiple Outliers in Linear Regression Using R

  • Kwon, Soon-Sun;Lee, Gwi-Hyun;Park, Sung-Hyun
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2005년도 추계 학술발표회 논문집
    • /
    • pp.13-17
    • /
    • 2005
  • In recent years, many people use R as a statistics system. R is frequently updated by many R project teams. We are interested in the method of multiple outlier detection and know that R is not supplied the method of multiple outlier detection. In this talk, we review these procedures for detecting multiple outliers and provide more efficient procedures combined with direct methods and indirect methods using R.

  • PDF

고차원 데이터에서 One-class SVM과 Spectral Clustering을 이용한 이진 예측 이상치 탐지 방법 (A Binary Prediction Method for Outlier Detection using One-class SVM and Spectral Clustering in High Dimensional Data)

  • 박정희
    • 한국멀티미디어학회논문지
    • /
    • 제25권6호
    • /
    • pp.886-893
    • /
    • 2022
  • Outlier detection refers to the task of detecting data that deviate significantly from the normal data distribution. Most outlier detection methods compute an outlier score which indicates the degree to which a data sample deviates from normal. However, setting a threshold for an outlier score to determine if a data sample is outlier or normal is not trivial. In this paper, we propose a binary prediction method for outlier detection based on spectral clustering and one-class SVM ensemble. Given training data consisting of normal data samples, a clustering method is performed to find clusters in the training data, and the ensemble of one-class SVM models trained on each cluster finds the boundaries of the normal data. We show how to obtain a threshold for transforming outlier scores computed from the ensemble of one-class SVM models into binary predictive values. Experimental results with high dimensional text data show that the proposed method can be effectively applied to high dimensional data, especially when the normal training data consists of different shapes and densities of clusters.

서포트벡터 기계를 이용한 이상치 진단 (Outlier Detection Using Support Vector Machines)

  • 서한손;윤민
    • Communications for Statistical Applications and Methods
    • /
    • 제18권2호
    • /
    • pp.171-177
    • /
    • 2011
  • 실생활에서 얻어지는 자료에서 근사함수를 구성하기 위하여 모델링을 하기 전에 측정된 원자료로부터 이상치를 제거하는 것이 필요하다. 기존의 이상치 진단의 방법들은 시각화나 최대 잔차들을 이용해왔다. 그러나 종종 다차원의 입력자료를 가지는 비선형함수에 대한 이상치 진단은 좋지 않은 결과를 얻었다. 다차원 입력자료를 갖는 비선형함수에 대한 전형적인서포트 벡터 회귀에 기초한 이상치 진단방법들은 좋은 수행능력을 얻어지지만, 계산비용이나 모수들의 보정 등의 실질적인 문제점들을 가지고 있다. 본 논문에서 계산비용을 감소하고 이상치의 문턱을 적절히 정의하는 서포트 벡터회귀를 이용한 이상치 진단의 실질적인방법을 제안한다. 제안한 방법을 실제자료들에 적용하여 타당성을 보일 것이다.

재무 시계열 자료의 이상치 탐지 기법 연구 (A Study on Outlier Detection Method for Financial Time Series Data)

  • 하명호;김삼용
    • 응용통계연구
    • /
    • 제23권1호
    • /
    • pp.41-47
    • /
    • 2010
  • 본 연구에서는 재무 시계열 자료를 분석하는데 있어 유용하게 쓰이는 이분산성 시계열 모형하에서 이상치 탐지 기법을 적용하여 그 효율성을 보이고자 한다. 먼저 GARCH 모형과 GARCH 모형하에서 이상치 탐지 기법에 대해 소개하고, 적용된 방법이 기존의 전통적인 이상치 탐지 방법보다 성능이 우수함을 시뮬레이션과 실제 KOSPI 자료에 적합시켜 입증하였다.