• Title/Summary/Keyword: outlier detection method

Search Result 126, Processing Time 0.024 seconds

Density-based Outlier Detection for Very Large Data (대용량 자료 분석을 위한 밀도기반 이상치 탐지)

  • Kim, Seung;Cho, Nam-Wook;Kang, Suk-Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.35 no.2
    • /
    • pp.71-88
    • /
    • 2010
  • A density-based outlier detection such as an LOF (Local Outlier Factor) tries to find an outlying observation by using density of its surrounding space. In spite of several advantages of a density-based outlier detection method, the computational complexity of outlier detection has been one of major barriers in its application. In this paper, we present an LOF algorithm that can reduce computation time of a density based outlier detection algorithm. A kd-tree indexing and approximated k-nearest neighbor search algorithm (ANN) are adopted in the proposed method. A set of experiments was conducted to examine performance of the proposed algorithm. The results show that the proposed method can effectively detect local outliers in reduced computation time.

TIME-VARIANT OUTLIER DETECTION METHOD ON GEOSENSOR NETWORKS

  • Kim, Dong-Phil;I, Gyeong-Min;Lee, Dong-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.410-413
    • /
    • 2008
  • Existing Outlier detections have been widely studied in geosensor networks. Recently, machine learning and data mining have been applied the outlier detection method to build a model that distinguishes outliers based on anchored criterion. However, it is difficult for the existing methods to detect outliers against incoming time-variant data, because outlier detection needs to monitor incoming data and classify irregular attacks. Therefore, in order to solve the problem, we propose a time-variant outlier detection using 2-dimensional grid method based on unanchored criterion. In the paper, outliers using geosensor data was performed to classify efficiently. The proposed method can be utilized applications such as network intrusion detection, stock market analysis, and error data detection in bank account.

  • PDF

A Distance-based Outlier Detection Method using Landmarks in High Dimensional Data (고차원 데이터에서 랜드마크를 이용한 거리 기반 이상치 탐지 방법)

  • Park, Cheong Hee
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.9
    • /
    • pp.1242-1250
    • /
    • 2021
  • Detection of outliers deviating normal data distribution in high dimensional data is an important technique in many application areas. In this paper, a distance-based outlier detection method using landmarks in high dimensional data is proposed. Given normal training data, the k-means clustering method is applied for the training data in order to extract the centers of the clusters as landmarks which represent normal data distribution. For a test data sample, the distance to the nearest landmark gives the outlier score. In the experiments using high dimensional data such as images and documents, it was shown that the proposed method based on the landmarks of one-tenth of training data can give the comparable outlier detection performance while reducing the time complexity greatly in the testing stage.

A Binary Prediction Method for Outlier Detection using One-class SVM and Spectral Clustering in High Dimensional Data (고차원 데이터에서 One-class SVM과 Spectral Clustering을 이용한 이진 예측 이상치 탐지 방법)

  • Park, Cheong Hee
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.6
    • /
    • pp.886-893
    • /
    • 2022
  • Outlier detection refers to the task of detecting data that deviate significantly from the normal data distribution. Most outlier detection methods compute an outlier score which indicates the degree to which a data sample deviates from normal. However, setting a threshold for an outlier score to determine if a data sample is outlier or normal is not trivial. In this paper, we propose a binary prediction method for outlier detection based on spectral clustering and one-class SVM ensemble. Given training data consisting of normal data samples, a clustering method is performed to find clusters in the training data, and the ensemble of one-class SVM models trained on each cluster finds the boundaries of the normal data. We show how to obtain a threshold for transforming outlier scores computed from the ensemble of one-class SVM models into binary predictive values. Experimental results with high dimensional text data show that the proposed method can be effectively applied to high dimensional data, especially when the normal training data consists of different shapes and densities of clusters.

Dam Sensor Outlier Detection using Mixed Prediction Model and Supervised Learning

  • Park, Chang-Mok
    • International journal of advanced smart convergence
    • /
    • v.7 no.1
    • /
    • pp.24-32
    • /
    • 2018
  • An outlier detection method using mixed prediction model has been described in this paper. The mixed prediction model consists of time-series model and regression model. The parameter estimation of the prediction model was performed using supervised learning and a genetic algorithm is adopted for a learning method. The experiments were performed in artificial and real data set. The prediction performance is compared with the existing prediction methods using artificial data. Outlier detection is conducted using the real sensor measurements in a dam. The validity of the proposed method was shown in the experiments.

A Comparative Study of a Robust Estimate Method for Abnormal Traffic Detection (이상 트래픽 탐지를 위한 로버스트 추정 방법 비교 연구)

  • Jung, Jae-Yoon;Kim, Sahm
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.4
    • /
    • pp.517-525
    • /
    • 2011
  • This paper shows the performance evaluation of a robust estimator based on the GARCH model. We first introduce the method of a robust estimate in the GARCH model and the method of an outlier detection in the GARCH model. The results of the real internet traffic data show the out-performance of the robust estimator over the outlier detection method in the GARCH model. In addition, the method of the robust estimate is less complex than the method of the outlier detection method in the GARCH model.

First Order Difference-Based Error Variance Estimator in Nonparametric Regression with a Single Outlier

  • Park, Chun-Gun
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.3
    • /
    • pp.333-344
    • /
    • 2012
  • We consider some statistical properties of the first order difference-based error variance estimator in nonparametric regression models with a single outlier. So far under an outlier(s) such difference-based estimators has been rarely discussed. We propose the first order difference-based estimator using the leave-one-out method to detect a single outlier and simulate the outlier detection in a nonparametric regression model with the single outlier. Moreover, the outlier detection works well. The results are promising even in nonparametric regression models with many outliers using some difference based estimators.

A Novel Battery State of Health Estimation Method Based on Outlier Detection Algorithm

  • Piao, Chang-hao;Hu, Zi-hao;Su, Ling;Zhao, Jian-fei
    • Journal of Electrical Engineering and Technology
    • /
    • v.11 no.6
    • /
    • pp.1802-1811
    • /
    • 2016
  • A novel battery SOH estimation algorithm based on outlier detection has been presented. The Battery state of health (SOH) is one of the most important parameters that describes the usability state of the power battery system. Firstly, a battery system model with lifetime fading characteristic was established, and the battery characteristic parameters were acquired from the lifetime fading process. Then, the outlier detection method based on angular distribution was used to identify the outliers among the battery behaviors. Lastly, the functional relationship between battery SOH and the outlier distribution was obtained by polynomial fitting method. The experimental results show that the algorithm can identify the outliers accurately, and the absolute error between the SOH estimation value and true value is less than 3%.

Fused Navigation of Unmanned Surface Vehicle and Detection of GPS Abnormality (무인 수상정의 융합 항법 및 GPS 이상 검출)

  • Ko, Nak Yong;Jeong, Seokki
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.22 no.9
    • /
    • pp.723-732
    • /
    • 2016
  • This paper proposes an approach to fused navigation of an unmanned surface vehicle(USV) and to detection of the outlier or interference of global positioning system(GPS). The method fuses available sensor measurements through extended Kalman filter(EKF) to find the location and attitude of the USV. The method uses error covariance of EKF for detection of GPS outlier or interference. When outlier or interference of the GPS is detected, the method excludes GPS data from navigation process. The measurements to be fused for the navigation are GPS, acceleration, angular rate, magnetic field, linear velocity, range and bearing to acoustic beacons. The method is tested through simulated data and measurement data produced through ground navigation. The results show that the method detects GPS outlier or interference as well as the GPS recovery, which frees navigation from the problem of GPS abnormality.

Procedures for Detecting Multiple Outliers in Linear Regression Using R

  • Kwon, Soon-Sun;Lee, Gwi-Hyun;Park, Sung-Hyun
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.13-17
    • /
    • 2005
  • In recent years, many people use R as a statistics system. R is frequently updated by many R project teams. We are interested in the method of multiple outlier detection and know that R is not supplied the method of multiple outlier detection. In this talk, we review these procedures for detecting multiple outliers and provide more efficient procedures combined with direct methods and indirect methods using R.

  • PDF