• Title/Summary/Keyword: outlier detection

Search Result 228, Processing Time 0.023 seconds

Outlier detection in dental research (치의학 연구에서 이상치의 처리)

  • Kim, Ki-Yeol
    • The Journal of the Korean dental association
    • /
    • v.55 no.9
    • /
    • pp.604-616
    • /
    • 2017
  • In clinical dental research, errors occur in spite of careful study design and conduct. Data cleaning procedures intend to identify and correct these errors or at least to minimize their influence on study. Outlier is the one of these errors. Outlier detection is the first step in data analysis process which has a serious effect in the field of dental research. Hence, this paper aims to introduce the methods to detect the outliers and to examine their influences in statistical data analysis.

  • PDF

New Blind Steganalysis Framework Combining Image Retrieval and Outlier Detection

  • Wu, Yunda;Zhang, Tao;Hou, Xiaodan;Xu, Chen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.12
    • /
    • pp.5643-5656
    • /
    • 2016
  • The detection accuracy of steganalysis depends on many factors, including the embedding algorithm, the payload size, the steganalysis feature space and the properties of the cover source. In practice, the cover source mismatch (CSM) problem has been recognized as the single most important factor negatively affecting the performance. To address this problem, we propose a new framework for blind, universal steganalysis which uses traditional steganalyst features. Firstly, cover images with the same statistical properties are searched from a reference image database as aided samples. The test image and its aided samples form a whole test set. Then, by assuming that most of the aided samples are innocent, we conduct outlier detection on the test set to judge the test image as cover or stego. In this way, the framework has removed the need for training. Hence, it does not suffer from cover source mismatch. Because it performs anomaly detection rather than classification, this method is totally unsupervised. The results in our study show that this framework works superior than one-class support vector machine and the outlier detector without considering the image retrieval process.

Outlier Detection of Autoregressive Models Using Robust Regression Estimators (로버스트 추정법을 이용한 자기상관회귀모형에서의 특이치 검출)

  • Lee Dong-Hee;Park You-Sung;Kim Kee-Whan
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.2
    • /
    • pp.305-317
    • /
    • 2006
  • Outliers adversely affect model identification, parameter estimation, and forecast in time series data. In particular, when outliers consist of a patch of additive outliers, the current outlier detection procedures suffer from the masking and swamping effects which make them inefficient. In this paper, we propose new outlier detection procedure based on high breakdown estimators, called as the dual robust filtering. Empirical and simulation studies in the autoregressive model with orders p show that the proposed procedure is effective.

Outlier Detection Based on MapReduce for Analyzing Big Data (대용량 데이터 분석을 위한 맵리듀스 기반의 이상치 탐지)

  • Hong, Yejin;Na, Eunhee;Jung, Yonghwan;Kim, Yangwoo
    • Journal of Internet Computing and Services
    • /
    • v.18 no.1
    • /
    • pp.27-35
    • /
    • 2017
  • In near future, IoT data is expected to be a major portion of Big Data. Moreover, sensor data is expected to be major portion of IoT data, and its' research is actively carried out currently. However, processed results may not be trusted and used if outlier data is included in the processing of sensor data. Therefore, method for detection and deletion of those outlier data before processing is studied in this paper. Moreover, we used Spark which is memory based distributed processing environment for fast processing of big sensor data. The detection and deletion of outlier data consist of four stages, and each stage is implemented with Mapper and Reducer operation. The proposed method is compared in three different processing environments, and it is expected that the outlier detection and deletion performance is best in the distributed Spark environment as data volume is increasing.

A Score test for Detection of Outliers in Nonlinear Regression

  • Kahng, Myung-Wook
    • Journal of the Korean Statistical Society
    • /
    • v.22 no.2
    • /
    • pp.201-208
    • /
    • 1993
  • Given the specific mean shift outlier model, the score test for multiple outliers in nonlinear regression is discussed as an alternative to the likelihood ratio test. The geometric interpretation of the score statistic is also presented.

  • PDF

Outlier detection of GPS monitoring data using relational analysis and negative selection algorithm

  • Yi, Ting-Hua;Ye, X.W.;Li, Hong-Nan;Guo, Qing
    • Smart Structures and Systems
    • /
    • v.20 no.2
    • /
    • pp.219-229
    • /
    • 2017
  • Outlier detection is an imperative task to identify the occurrence of abnormal events before the structures are suffered from sudden failure during their service lives. This paper proposes a two-phase method for the outlier detection of Global Positioning System (GPS) monitoring data. Prompt judgment of the occurrence of abnormal data is firstly carried out by use of the relational analysis as the relationship among the data obtained from the adjacent locations following a certain rule. Then, a negative selection algorithm (NSA) is adopted for further accurate localization of the abnormal data. To reduce the computation cost in the NSA, an improved scheme by integrating the adjustable radius into the training stage is designed and implemented. Numerical simulations and experimental verifications demonstrate that the proposed method is encouraging compared with the original method in the aspects of efficiency and reliability. This method is only based on the monitoring data without the requirement of the engineer expertise on the structural operational characteristics, which can be easily embedded in a software system for the continuous and reliable monitoring of civil infrastructure.

Outlier Detection Method for Time Synchronization

  • Lee, Young Kyu;Yang, Sung-hoon;Lee, Ho Seong;Lee, Jong Koo;Lee, Joon Hyo;Hwang, Sang-wook
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.9 no.4
    • /
    • pp.397-403
    • /
    • 2020
  • In order to synchronize a remote system time to the reference time like Coordinated Universal Time (UTC), it is required to compare the time difference between the two clocks. The time comparison data may have some outliers and the time synchronization performance can be significantly degraded if the outliers are not removed. Therefore, it is required to employ an effective outlier detection algorithm for keeping high accurate system time. In this paper, an outlier detection method is presented for the time difference data of GNSS time transfer receivers. The time difference data between the system time and the GNSS usually have slopes because the remote system clock is under free running until synchronized to the reference clock time. For investigating the outlier detection performance of the proposed algorithm, simulations are performed by using the time difference data of a GNSS time transfer receiver corrected to a free running Cesium clock with intentionally inserted outliers. From the simulation, it is investigated that the proposed algorithm can effectively detect the inserted outliers while conventional methods such as modified Z-score and adjusted boxplot cannot. Furthermore, it is also observed that the synchronization performance can be degraded to more than 15% with 20 outliers compared to that of original data without outliers.

Outlier detection for multivariate long memory processes (다변량 장기 종속 시계열에서의 이상점 탐지)

  • Kim, Kyunghee;Yu, Seungyeon;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.3
    • /
    • pp.395-406
    • /
    • 2022
  • This paper studies the outlier detection method for multivariate long memory time series. The existing outlier detection methods are based on a short memory VARMA model, so they are not suitable for multivariate long memory time series. It is because higher order of autoregressive model is necessary to account for long memory, however, it can also induce estimation instability as the number of parameter increases. To resolve this issue, we propose outlier detection methods based on the VHAR structure. We also adapt the robust estimation method to estimate VHAR coefficients more efficiently. Our simulation results show that our proposed method performs well in detecting outliers in multivariate long memory time series. Empirical analysis with stock index shows RVHAR model finds additional outliers that existing model does not detect.

Outlier Detection in Random Effects Model Using Fractional Bayes Factor

  • Chung, Younshik
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.1
    • /
    • pp.141-150
    • /
    • 2000
  • In this paper we propose a method of computing Bayes factor to detect an outlier in a random effects model. When no information is available and hence improper noninformative priors should be used Bayes factor includes the unspecified constants and has complicated computational burden. To solve this problem we use the fractional Bayes factor (FBF) of O-Hagan(1995) and the generalized Savage0-Dickey density ratio of Verdinelli and Wasserman (1995) The proposed method is applied to outlier deterction problem We perform a simulation of the proposed approach with a simulated data set including an outlier and also analyze a real data set.

  • PDF

Variable Selection and Outlier Detection for Automated K-means Clustering

  • Kim, Sung-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.1
    • /
    • pp.55-67
    • /
    • 2015
  • An important problem in cluster analysis is the selection of variables that define cluster structure that also eliminate noisy variables that mask cluster structure; in addition, outlier detection is a fundamental task for cluster analysis. Here we provide an automated K-means clustering process combined with variable selection and outlier identification. The Automated K-means clustering procedure consists of three processes: (i) automatically calculating the cluster number and initial cluster center whenever a new variable is added, (ii) identifying outliers for each cluster depending on used variables, (iii) selecting variables defining cluster structure in a forward manner. To select variables, we applied VS-KM (variable-selection heuristic for K-means clustering) procedure (Brusco and Cradit, 2001). To identify outliers, we used a hybrid approach combining a clustering based approach and distance based approach. Simulation results indicate that the proposed automated K-means clustering procedure is effective to select variables and identify outliers. The implemented R program can be obtained at http://www.knou.ac.kr/~sskim/SVOKmeans.r.