• Title/Summary/Keyword: Outlier

Search Result 668, Processing Time 0.032 seconds

TIME-VARIANT OUTLIER DETECTION METHOD ON GEOSENSOR NETWORKS

  • Kim, Dong-Phil;I, Gyeong-Min;Lee, Dong-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.410-413
    • /
    • 2008
  • Existing Outlier detections have been widely studied in geosensor networks. Recently, machine learning and data mining have been applied the outlier detection method to build a model that distinguishes outliers based on anchored criterion. However, it is difficult for the existing methods to detect outliers against incoming time-variant data, because outlier detection needs to monitor incoming data and classify irregular attacks. Therefore, in order to solve the problem, we propose a time-variant outlier detection using 2-dimensional grid method based on unanchored criterion. In the paper, outliers using geosensor data was performed to classify efficiently. The proposed method can be utilized applications such as network intrusion detection, stock market analysis, and error data detection in bank account.

  • PDF

Outlier correction from uncalibrated image sequence (영상 시퀀스의 특징점에 대한 Outlier 보정)

  • 김재학;박종승;황지운;한준희
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.706-708
    • /
    • 2004
  • 본 논문에서는 영상 시퀀스(image sequence)에서 얻은 특징점(feature point) 중 outlier를 제거 및 보정할 수 있는 방법을 제시한다. 영상 시퀀스가 주어졌을 때, 우리는 이 영상에서 특징점 추적(tracking)을 하여, 영상의 중요한 정보로 이용한다. 이러한 자동적으로 얻어낸 특징점 추적 데이터는 올바르지 못하게 추적 된 것이 있기 마련인데, 이렇게 올바르지 못한 데이터. 즉, outlier를 제거하기 위하여, 기존의 방법들은 trifocal tensor를 주로 사용하였다. 그러나 trifocal tensor 는 영상이 3장으로 제한되어 있다. 또한 outlier를 찾은 후에는 제 거 만 하게되어, 입력 데이터의 개수를 줄이게 되는 단점이 있다. 따라서, 우리는 triangulation방법을 이용하여, 3장 이상의 영상에서도, outlier의 제거와 보정이 동시에 가능한 방법을 제시한다.

  • PDF

A Distance-based Outlier Detection Method using Landmarks in High Dimensional Data (고차원 데이터에서 랜드마크를 이용한 거리 기반 이상치 탐지 방법)

  • Park, Cheong Hee
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.9
    • /
    • pp.1242-1250
    • /
    • 2021
  • Detection of outliers deviating normal data distribution in high dimensional data is an important technique in many application areas. In this paper, a distance-based outlier detection method using landmarks in high dimensional data is proposed. Given normal training data, the k-means clustering method is applied for the training data in order to extract the centers of the clusters as landmarks which represent normal data distribution. For a test data sample, the distance to the nearest landmark gives the outlier score. In the experiments using high dimensional data such as images and documents, it was shown that the proposed method based on the landmarks of one-tenth of training data can give the comparable outlier detection performance while reducing the time complexity greatly in the testing stage.

Density-based Outlier Detection in Multi-dimensional Datasets

  • Wang, Xite;Cao, Zhixin;Zhan, Rongjuan;Bai, Mei;Ma, Qian;Li, Guanyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3815-3835
    • /
    • 2022
  • Density-based outlier detection is one of the hot issues in data mining. A point is determined as outlier on basis of the density of points near them. The existing density-based detection algorithms have high time complexity, in order to reduce the time complexity, a new outlier detection algorithm DODMD (Density-based Outlier Detection in Multidimensional Datasets) is proposed. Firstly, on the basis of ZH-tree, the concept of micro-cluster is introduced. Each leaf node is regarded as a micro-cluster, and the micro-cluster is calculated to achieve the purpose of batch filtering. In order to obtain n sets of approximate outliers quickly, a greedy method is used to calculate the boundary of LOF and mark the minimum value as LOFmin. Secondly, the outliers can filtered out by LOFmin, the real outliers are calculated, and then the result set is updated to make the boundary closer. Finally, the accuracy and efficiency of DODMD algorithm are verified on real dataset and synthetic dataset respectively.

Outlier(이상치) 분석을 통한 등부표 등부표 효율적 위치 관리 방안 연구

  • 최광영;송재욱
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2023.05a
    • /
    • pp.290-291
    • /
    • 2023
  • Outlier(이상치) 분석을 통한 등부표 선회안전반경 정보 제공에 관한 연구는 AIS 또는 RTU가 설치된 등부표에 대한 이탈 위험 인지, 항해안전 사고 예방 등 안전대책을 강화하기 위한 연구이다. 등부표는 조류, 바람 등 외력에 의해 이출거리가 발생하여 일정한 패턴으로 선회반경이 형성되나 외력으로 인하여 정상범위에서 벗어나 유실, 위치이동 등이 발생할 수 있고 이는 선박추돌 등 항해안전 사고로도 이어질 수 있다. 이러한 등부표 사고는 물적 피해비용과 이용자의 안전운항에 대한 심리적 부담감 또는 위험감수 등의 추가적인 행정소요 비용이 발생할 수 있다. Outlier(이상치)란 외력 등으로 인해 최대 이출거리 이내 정상범위에서 벗어나거나 존재할 수 없는 극단적인 위치 값으로써 21년도 등부표 위치 데이터를 일정 단위 방위별로 분석해 본 결과 Outlier(이상치)가 식별되었다. 따라서 등부표의 안전한 위치 상태를 시스템적으로 모니터링 하기 위해 Outlier(이상치) 분석을 통한 등부표 선회안전반경 정보 제공에 관한 연구를 하였다.

  • PDF

Outlier Detection of Real-Time Reservoir Water Level Data Using Threshold Model and Artificial Neural Network Model (임계치 모형과 인공신경망 모형을 이용한 실시간 저수지 수위자료의 이상치 탐지)

  • Kim, Maga;Choi, Jin-Yong;Bang, Jehong;Lee, Jaeju
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.61 no.1
    • /
    • pp.107-120
    • /
    • 2019
  • Reservoir water level data identify the current water storage of the reservoir, and they are utilized as primary data for management and research of agricultural water. For the reservoir storage management, Korea Rural Community Corporation (KRC) installed water level stations at around 1,600 agricultural reservoirs and has been collecting the water level data every 10 minutes. However, various kinds of outliers due to noise and erroneous problems are frequently appearing because of environmental and physical causes. Therefore, it is necessary to detect outlier and improve the quality of reservoir water level data to utilize the water level data in purpose. This study was conducted to detect and classify outlier and normal data using two different models including the threshold model and the artificial neural network (ANN) model. The results were compared to evaluate the performance of the models. The threshold model identifies the outlier by setting the upper/lower bound of water level data and variation data and by setting bandwidth of water level data as a threshold of regarding erroneous water level. The ANN model was trained with prepared training dataset as normal data (T) and outlier (F), and the ANN model operated for identifying the outlier. The models are evaluated with reference data which were collected reservoir water level data in daily by KRC. The outlier detection performance of the threshold model was better than the ANN model, but ANN model showed better detection performance for not classifying normal data as outlier.

Multiple Imputation Reducing Outlier Effect using Weight Adjustment Methods (가중치 보정을 이용한 다중대체법)

  • Kim, Jin-Young;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.4
    • /
    • pp.635-647
    • /
    • 2013
  • Imputation is a commonly used method to handle missing survey data. The performance of the imputation method is influenced by various factors, especially an outlier. The removal of the outlier in a data set is a simple and effective approach to reduce the effect of an outlier. In this paper in order to improve the precision of multiple imputation, we study a imputation method which reduces the effect of outlier using various weight adjustment methods that include the removal of an outlier method. The regression method in PROC/MI in SAS is used for multiple imputation and the obtained final adjusted weight is used as a weight variable to obtain the imputed values. Simulation studies compared the performance of various weight adjustment methods and Monthly Labor Statistic data is used for real data analysis.

Big Data Smoothing and Outlier Removal for Patent Big Data Analysis

  • Choi, JunHyeog;Jun, Sunghae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.8
    • /
    • pp.77-84
    • /
    • 2016
  • In general statistical analysis, we need to make a normal assumption. If this assumption is not satisfied, we cannot expect a good result of statistical data analysis. Most of statistical methods processing the outlier and noise also need to the assumption. But the assumption is not satisfied in big data because of its large volume and heterogeneity. So we propose a methodology based on box-plot and data smoothing for controling outlier and noise in big data analysis. The proposed methodology is not dependent upon the normal assumption. In addition, we select patent documents as target domain of big data because patent big data analysis is a important issue in management of technology. We analyze patent documents using big data learning methods for technology analysis. The collected patent data from patent databases on the world are preprocessed and analyzed by text mining and statistics. But the most researches about patent big data analysis did not consider the outlier and noise problem. This problem decreases the accuracy of prediction and increases the variance of parameter estimation. In this paper, we check the existence of the outlier and noise in patent big data. To know whether the outlier is or not in the patent big data, we use box-plot and smoothing visualization. We use the patent documents related to three dimensional printing technology to illustrate how the proposed methodology can be used for finding the existence of noise in the searched patent big data.

Estimations for a Uniform Scale Parameter in the Presence of a Half-Triangle Outlier

  • Lee, Chang-Soo;Kim, Kee-Hwan;Park, Yang-Woo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.3
    • /
    • pp.959-965
    • /
    • 2008
  • We shall propose several estimators for the scale parameter in a uniform distribution with the presence of a half-triangle outlier, and obtain mean squared errors(MSE's) for their proposed estimators. And we shall compare numerically efficiencies for proposed several estimators of the scale parameter in a uniform distribution with the presence of a half-triangle outlier in the small sample sizes.

  • PDF

Accuracy of Multiple Outlier Tests in Nonlinear Regression

  • Kahng, Myung-Wook
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.1
    • /
    • pp.131-136
    • /
    • 2011
  • The original Bates-Watts framework applies only to the complete parameter vector. Thus, guidelines developed in that framework can be misleading when the adequacy of the linear approximation is very different for different subsets. The subset curvature measures appear to be reliable indicators of the adequacy of linear approximation for an arbitrary subset of parameters in nonlinear models. Given the specific mean shift outlier model, the standard approaches to obtaining test statistics for outliers are discussed. The accuracy of outlier tests is investigated using subset curvatures.