• Title/Summary/Keyword: Outlier model

Search Result 210, Processing Time 0.024 seconds

Temporal and spatial outlier detection in wireless sensor networks

  • Nguyen, Hoc Thai;Thai, Nguyen Huu
    • ETRI Journal
    • /
    • v.41 no.4
    • /
    • pp.437-451
    • /
    • 2019
  • Outlier detection techniques play an important role in enhancing the reliability of data communication in wireless sensor networks (WSNs). Considering the importance of outlier detection in WSNs, many outlier detection techniques have been proposed. Unfortunately, most of these techniques still have some potential limitations, that is, (a) high rate of false positives, (b) high time complexity, and (c) failure to detect outliers online. Moreover, these approaches mainly focus on either temporal outliers or spatial outliers. Therefore, this paper aims to introduce novel algorithms that successfully detect both temporal outliers and spatial outliers. Our contributions are twofold: (i) modifying the Hampel Identifier (HI) algorithm to achieve high accuracy identification rate in temporal outlier detection, (ii) combining the Gaussian process (GP) model and graph-based outlier detection technique to improve the performance of the algorithm in spatial outlier detection. The results demonstrate that our techniques outperform the state-of-the-art methods in terms of accuracy and work well with various data types.

A Comparative Study of a Robust Estimate Method for Abnormal Traffic Detection (이상 트래픽 탐지를 위한 로버스트 추정 방법 비교 연구)

  • Jung, Jae-Yoon;Kim, Sahm
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.4
    • /
    • pp.517-525
    • /
    • 2011
  • This paper shows the performance evaluation of a robust estimator based on the GARCH model. We first introduce the method of a robust estimate in the GARCH model and the method of an outlier detection in the GARCH model. The results of the real internet traffic data show the out-performance of the robust estimator over the outlier detection method in the GARCH model. In addition, the method of the robust estimate is less complex than the method of the outlier detection method in the GARCH model.

A Study on Outlier Detection Method for Financial Time Series Data (재무 시계열 자료의 이상치 탐지 기법 연구)

  • Ha, M.H.;Kim, S.
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.1
    • /
    • pp.41-47
    • /
    • 2010
  • In this paper, we show the performance evaluation of outlier detection methods based on the GARCH model. We first introduce GARCH model and the methods of outlier detection in the GARCH model. The results of small simulation and the real KOSPI data show the out-performance of the outlier detection method over the traditional method in the GARCH model.

Outlier detection in time series data (시계열 자료에서의 특이치 발견)

  • Choi, Jeong In;Um, In Ok;Choa, Hyung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.5
    • /
    • pp.907-920
    • /
    • 2016
  • This study suggests an outlier detection algorithm that uses quantile autoregressive model in time series data, eventually applying it to actual stock manipulation cases by comparing its performance to existing methods. Studies on outlier detection have traditionally been conducted mostly in general data and those in time series data are insufficient. They have also been limited to a parametric model, which is not convenient as it is complicated with an analysis that takes a long time. Thus, we suggest a new algorithm of outlier detection in time series data and through various simulations, compare it to existing algorithms. Especially, the outlier detection algorithm in time series data can be useful in finding stock manipulation. If stock price which had a certain pattern goes out of flow and generates an outlier, it can be due to intentional intervention and manipulation. We examined how fast the model can detect stock manipulations by applying it to actual stock manipulation cases.

Bayesian Estimation for the Left Truncated Exponential Lifetime Distribution with Inclusion and Exclusion of an Outlier

  • PARK, Man-Gon
    • Journal of Korean Society for Quality Management
    • /
    • v.16 no.2
    • /
    • pp.56-67
    • /
    • 1988
  • It is wellknown that the left truncated exponential distribution with positivity constraint on the location parameter is appropriate as a lifetime distribution model, In this paper, some Bayes estimators of the parameters and reliability for the left truncated exponential lifetime distribution when an unidentified-failure outlier is included and it is excluded in the exchangeable outlier model are proposed, and the performances of these proposed Bayes estimators are also discussed.

  • PDF

Bayesian Outlier Detection in Regression Model

  • Younshik Chung;Kim, Hyungsoon
    • Journal of the Korean Statistical Society
    • /
    • v.28 no.3
    • /
    • pp.311-324
    • /
    • 1999
  • The problem of 'outliers', observations which look suspicious in some way, has long been one of the most concern in the statistical structure to experimenters and data analysts. We propose a model for an outlier problem and also analyze it in linear regression model using a Bayesian approach. Then we use the mean-shift model and SSVS(George and McCulloch, 1993)'s idea which is based on the data augmentation method. The advantage of proposed method is to find a subset of data which is most suspicious in the given model by the posterior probability. The MCMC method(Gibbs sampler) can be used to overcome the complicated Bayesian computation. Finally, a proposed method is applied to a simulated data and a real data.

  • PDF

Accuracy of Multiple Outlier Tests in Nonlinear Regression

  • Kahng, Myung-Wook
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.1
    • /
    • pp.131-136
    • /
    • 2011
  • The original Bates-Watts framework applies only to the complete parameter vector. Thus, guidelines developed in that framework can be misleading when the adequacy of the linear approximation is very different for different subsets. The subset curvature measures appear to be reliable indicators of the adequacy of linear approximation for an arbitrary subset of parameters in nonlinear models. Given the specific mean shift outlier model, the standard approaches to obtaining test statistics for outliers are discussed. The accuracy of outlier tests is investigated using subset curvatures.

Outlier Detection of Autoregressive Models Using Robust Regression Estimators (로버스트 추정법을 이용한 자기상관회귀모형에서의 특이치 검출)

  • Lee Dong-Hee;Park You-Sung;Kim Kee-Whan
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.2
    • /
    • pp.305-317
    • /
    • 2006
  • Outliers adversely affect model identification, parameter estimation, and forecast in time series data. In particular, when outliers consist of a patch of additive outliers, the current outlier detection procedures suffer from the masking and swamping effects which make them inefficient. In this paper, we propose new outlier detection procedure based on high breakdown estimators, called as the dual robust filtering. Empirical and simulation studies in the autoregressive model with orders p show that the proposed procedure is effective.

Application of deterministic models for obtaining groundwater level distributions through outlier analysis

  • Dae-Hong Min;Saheed Mayowa Taiwo;Junghee Park;Sewon Kim;Hyung-Koo Yoon
    • Geomechanics and Engineering
    • /
    • v.35 no.5
    • /
    • pp.499-509
    • /
    • 2023
  • The objective of this study is to perform outlier analysis to obtain the distribution of groundwater levels through the best model. The groundwater levels are measured in 10, 25 and 30 piezometers in Seoul, Daejeon and Suncheon in South Korea. Fifty-eight empirical distribution functions were applied to determine a suitable fit for the measured groundwater levels. The best fitted models based on the measured values are determined as the Generalized Pareto distribution, the Johnson SB distribution and the Normal distribution for Seoul, Daejeon and Suncheon, respectively; the reliability is estimated through the Anderson-Darling method. In this study, to choose the appropriate confidence interval, the relationship between the amount of outlier data and the confidence level is demonstrated, and then the 95% is selected at a reasonable confidence level. The best model shows a smaller error ratio than the GEV while the Mahalanobis distance and outlier labelling methods results are compared and validated. The outlier labelling and Mahalanobis distance based on median shown higher validated error ratios compared to their mean equivalent suggesting, the methods sensitivity to data structure.

Outlier Detection By Clustering-Based Ensemble Model Construction (클러스터링 기반 앙상블 모델 구성을 이용한 이상치 탐지)

  • Park, Cheong Hee;Kim, Taegong;Kim, Jiil;Choi, Semok;Lee, Gyeong-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.11
    • /
    • pp.435-442
    • /
    • 2018
  • Outlier detection means to detect data samples that deviate significantly from the distribution of normal data. Most outlier detection methods calculate an outlier score that indicates the extent to which a data sample is out of normal state and determine it to be an outlier when its outlier score is above a given threshold. However, since the range of an outlier score is different for each data and the outliers exist at a smaller ratio than the normal data, it is very difficult to determine the threshold value for an outlier score. Further, in an actual situation, it is not easy to acquire data including a sufficient amount of outliers available for learning. In this paper, we propose a clustering-based outlier detection method by constructing a model representing a normal data region using only normal data and performing binary classification of outliers and normal data for new data samples. Then, by dividing the given normal data into chunks, and constructing a clustering model for each chunk, we expand it to the ensemble method combining the decision by the models and apply it to the streaming data with dynamic changes. Experimental results using real data and artificial data show high performance of the proposed method.