• 제목/요약/키워드: outlier detection method

검색결과 128건 처리시간 0.019초

대용량 자료 분석을 위한 밀도기반 이상치 탐지 (Density-based Outlier Detection for Very Large Data)

  • 김승;조남욱;강석호
    • 한국경영과학회지
    • /
    • 제35권2호
    • /
    • pp.71-88
    • /
    • 2010
  • A density-based outlier detection such as an LOF (Local Outlier Factor) tries to find an outlying observation by using density of its surrounding space. In spite of several advantages of a density-based outlier detection method, the computational complexity of outlier detection has been one of major barriers in its application. In this paper, we present an LOF algorithm that can reduce computation time of a density based outlier detection algorithm. A kd-tree indexing and approximated k-nearest neighbor search algorithm (ANN) are adopted in the proposed method. A set of experiments was conducted to examine performance of the proposed algorithm. The results show that the proposed method can effectively detect local outliers in reduced computation time.

TIME-VARIANT OUTLIER DETECTION METHOD ON GEOSENSOR NETWORKS

  • Kim, Dong-Phil;I, Gyeong-Min;Lee, Dong-Gyu;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2008년도 International Symposium on Remote Sensing
    • /
    • pp.410-413
    • /
    • 2008
  • Existing Outlier detections have been widely studied in geosensor networks. Recently, machine learning and data mining have been applied the outlier detection method to build a model that distinguishes outliers based on anchored criterion. However, it is difficult for the existing methods to detect outliers against incoming time-variant data, because outlier detection needs to monitor incoming data and classify irregular attacks. Therefore, in order to solve the problem, we propose a time-variant outlier detection using 2-dimensional grid method based on unanchored criterion. In the paper, outliers using geosensor data was performed to classify efficiently. The proposed method can be utilized applications such as network intrusion detection, stock market analysis, and error data detection in bank account.

  • PDF

고차원 데이터에서 랜드마크를 이용한 거리 기반 이상치 탐지 방법 (A Distance-based Outlier Detection Method using Landmarks in High Dimensional Data)

  • 박정희
    • 한국멀티미디어학회논문지
    • /
    • 제24권9호
    • /
    • pp.1242-1250
    • /
    • 2021
  • Detection of outliers deviating normal data distribution in high dimensional data is an important technique in many application areas. In this paper, a distance-based outlier detection method using landmarks in high dimensional data is proposed. Given normal training data, the k-means clustering method is applied for the training data in order to extract the centers of the clusters as landmarks which represent normal data distribution. For a test data sample, the distance to the nearest landmark gives the outlier score. In the experiments using high dimensional data such as images and documents, it was shown that the proposed method based on the landmarks of one-tenth of training data can give the comparable outlier detection performance while reducing the time complexity greatly in the testing stage.

고차원 데이터에서 One-class SVM과 Spectral Clustering을 이용한 이진 예측 이상치 탐지 방법 (A Binary Prediction Method for Outlier Detection using One-class SVM and Spectral Clustering in High Dimensional Data)

  • 박정희
    • 한국멀티미디어학회논문지
    • /
    • 제25권6호
    • /
    • pp.886-893
    • /
    • 2022
  • Outlier detection refers to the task of detecting data that deviate significantly from the normal data distribution. Most outlier detection methods compute an outlier score which indicates the degree to which a data sample deviates from normal. However, setting a threshold for an outlier score to determine if a data sample is outlier or normal is not trivial. In this paper, we propose a binary prediction method for outlier detection based on spectral clustering and one-class SVM ensemble. Given training data consisting of normal data samples, a clustering method is performed to find clusters in the training data, and the ensemble of one-class SVM models trained on each cluster finds the boundaries of the normal data. We show how to obtain a threshold for transforming outlier scores computed from the ensemble of one-class SVM models into binary predictive values. Experimental results with high dimensional text data show that the proposed method can be effectively applied to high dimensional data, especially when the normal training data consists of different shapes and densities of clusters.

Dam Sensor Outlier Detection using Mixed Prediction Model and Supervised Learning

  • Park, Chang-Mok
    • International journal of advanced smart convergence
    • /
    • 제7권1호
    • /
    • pp.24-32
    • /
    • 2018
  • An outlier detection method using mixed prediction model has been described in this paper. The mixed prediction model consists of time-series model and regression model. The parameter estimation of the prediction model was performed using supervised learning and a genetic algorithm is adopted for a learning method. The experiments were performed in artificial and real data set. The prediction performance is compared with the existing prediction methods using artificial data. Outlier detection is conducted using the real sensor measurements in a dam. The validity of the proposed method was shown in the experiments.

이상 트래픽 탐지를 위한 로버스트 추정 방법 비교 연구 (A Comparative Study of a Robust Estimate Method for Abnormal Traffic Detection)

  • 정재윤;김삼용
    • Communications for Statistical Applications and Methods
    • /
    • 제18권4호
    • /
    • pp.517-525
    • /
    • 2011
  • 본 연구는 이상치가 존재하는 자료에 적용될 수 있는 방법을 비교한 연구로서, 이분산 시계열 모형 하에서 로버스트 추정 방법의 효용성을 보이고자 한다. GARCH 모형하에서 이상치 탐지 기법과 GARCH 모형을기반한 로버스트 추정방법의 성능을 비교하였다. 실제 인터넷 트래픽 자료에 두 방법을 적용했을때, 로버스트 추정방법이 이상치 탐지 기법에 비해 덜 복잡하고 성능이 우수함을 입증하였다.

First Order Difference-Based Error Variance Estimator in Nonparametric Regression with a Single Outlier

  • Park, Chun-Gun
    • Communications for Statistical Applications and Methods
    • /
    • 제19권3호
    • /
    • pp.333-344
    • /
    • 2012
  • We consider some statistical properties of the first order difference-based error variance estimator in nonparametric regression models with a single outlier. So far under an outlier(s) such difference-based estimators has been rarely discussed. We propose the first order difference-based estimator using the leave-one-out method to detect a single outlier and simulate the outlier detection in a nonparametric regression model with the single outlier. Moreover, the outlier detection works well. The results are promising even in nonparametric regression models with many outliers using some difference based estimators.

A Novel Battery State of Health Estimation Method Based on Outlier Detection Algorithm

  • Piao, Chang-hao;Hu, Zi-hao;Su, Ling;Zhao, Jian-fei
    • Journal of Electrical Engineering and Technology
    • /
    • 제11권6호
    • /
    • pp.1802-1811
    • /
    • 2016
  • A novel battery SOH estimation algorithm based on outlier detection has been presented. The Battery state of health (SOH) is one of the most important parameters that describes the usability state of the power battery system. Firstly, a battery system model with lifetime fading characteristic was established, and the battery characteristic parameters were acquired from the lifetime fading process. Then, the outlier detection method based on angular distribution was used to identify the outliers among the battery behaviors. Lastly, the functional relationship between battery SOH and the outlier distribution was obtained by polynomial fitting method. The experimental results show that the algorithm can identify the outliers accurately, and the absolute error between the SOH estimation value and true value is less than 3%.

무인 수상정의 융합 항법 및 GPS 이상 검출 (Fused Navigation of Unmanned Surface Vehicle and Detection of GPS Abnormality)

  • 고낙용;정석기
    • 제어로봇시스템학회논문지
    • /
    • 제22권9호
    • /
    • pp.723-732
    • /
    • 2016
  • This paper proposes an approach to fused navigation of an unmanned surface vehicle(USV) and to detection of the outlier or interference of global positioning system(GPS). The method fuses available sensor measurements through extended Kalman filter(EKF) to find the location and attitude of the USV. The method uses error covariance of EKF for detection of GPS outlier or interference. When outlier or interference of the GPS is detected, the method excludes GPS data from navigation process. The measurements to be fused for the navigation are GPS, acceleration, angular rate, magnetic field, linear velocity, range and bearing to acoustic beacons. The method is tested through simulated data and measurement data produced through ground navigation. The results show that the method detects GPS outlier or interference as well as the GPS recovery, which frees navigation from the problem of GPS abnormality.

Procedures for Detecting Multiple Outliers in Linear Regression Using R

  • Kwon, Soon-Sun;Lee, Gwi-Hyun;Park, Sung-Hyun
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2005년도 추계 학술발표회 논문집
    • /
    • pp.13-17
    • /
    • 2005
  • In recent years, many people use R as a statistics system. R is frequently updated by many R project teams. We are interested in the method of multiple outlier detection and know that R is not supplied the method of multiple outlier detection. In this talk, we review these procedures for detecting multiple outliers and provide more efficient procedures combined with direct methods and indirect methods using R.

  • PDF