• Title/Summary/Keyword: outliers

Search Result 655, Processing Time 0.027 seconds

A Bayesian Approach to Detecting Outliers Using Variance-Inflation Model

  • Lee, Sangjeen;Chung, Younshik
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.805-814
    • /
    • 2001
  • The problem of 'outliers', observations which look suspicious in some way, has long been one of the most concern in the statistical structure to experimenters and data analysts. We propose a model for outliers problem and also analyze it in linear regression model using a Bayesian approach with the variance-inflation model. We will use Geweke's(1996) ideas which is based on the data augmentation method for detecting outliers in linear regression model. The advantage of the proposed method is to find a subset of data which is most suspicious in the given model by the posterior probability The sampling based approach can be used to allow the complicated Bayesian computation. Finally, our proposed methodology is applied to a simulated and a real data.

  • PDF

Outward Testing Procedure for the Identification of Multiple Outliers (다수 이상치 인식(認識)을 위한 외향성 검정 절차)

  • Yum, Joon-Keun;Kim, Jong-Woo
    • Journal of Korean Society for Quality Management
    • /
    • v.24 no.3
    • /
    • pp.50-64
    • /
    • 1996
  • This article is concerned with procedures for detecting multiple y outliers in linear regression. The outward-testing procedure, which is controled by the initial subset and the minimum residuals, is suggested by two phases. The performance of this procedure is compared with others by Monte Carlo techniques and found to be superior. The procedure, however, fails in detecting y outliers that are on high-leverage cases in Phase 1. Thus, we proposed ELMS algorithm for a set of suspect observations, in Phase 1. In Phase 2, the proposed testing is conducted using the studentized residuals to see which of the suspect cases are outliers. Several examples are analyzed.

  • PDF

MULTIPLE OUTLIER DETECTION IN LOGISTIC REGRESSION BY USING INFLUENCE MATRIX

  • Lee, Gwi-Hyun;Park, Sung-Hyun
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.4
    • /
    • pp.457-469
    • /
    • 2007
  • Many procedures are available to identify a single outlier or an isolated influential point in linear regression and logistic regression. But the detection of influential points or multiple outliers is more difficult, owing to masking and swamping problems. The multiple outlier detection methods for logistic regression have not been studied from the points of direct procedure yet. In this paper we consider the direct methods for logistic regression by extending the $Pe\tilde{n}a$ and Yohai (1995) influence matrix algorithm. We define the influence matrix in logistic regression by using Cook's distance in logistic regression, and test multiple outliers by using the mean shift model. To show accuracy of the proposed multiple outlier detection algorithm, we simulate artificial data including multiple outliers with masking and swamping.

A Robust Vector Quantization Method against Distortion Outlier and Source Mismatch (이상 신호왜곡과 소스 불일치에 강인한 벡터 양자화 방법)

  • Noh, Myung-Hoon;Kim, Moo-Young
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.3
    • /
    • pp.74-80
    • /
    • 2012
  • In resolution-constrained quantization, the size of Voronoi cell varies depending on probability density function of the input data, which causes large amount of distortion outliers. We propose a vector quantization method that reduces distortion outliers by combining the generalized Lloyd algorithm (GLA) and the cell-size constrained vector quantization (CCVQ) scheme. The training data are divided into the inside and outside regions according to the size of Voronoi cell, and consequently CCVQ and GLA are applied to each region, respectively. As CCVQ is applied to the densely populated region of the source instead of GLA, the number of centroids for the outside region can be increased such that distortion outliers can be decreased. In real-world environment, source mismatch between training and test data is inevitable. For the source mismatch case, the proposed algorithm improves performance in terms of average distortion and distortion outliers.

An Outlier Data Analysis using Support Vector Regression (Support Vector Regression을 이용한 이상치 데이터분석)

  • Jun, Sung-Hae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.6
    • /
    • pp.876-880
    • /
    • 2008
  • Outliers are the observations which are very larger or smaller than most observations in the given data set. These are shown by some sources. The result of the analysis with outliers may be depended on them. In general, we do data analysis after removing outliers. But, in data mining applications such as fraud detection and intrusion detection, outliers are included in training data because they have crucial information. In regression models, simple and multiple regression models need to eliminate outliers from given training data by standadized and studentized residuals to construct good model. In this paper, we use support vector regression(SVR) based on statistical teaming theory to analyze data with outliers in regression. We verify the improved performance of our work by the experiment using synthetic data sets.

Outlier Detection Method for Time Synchronization

  • Lee, Young Kyu;Yang, Sung-hoon;Lee, Ho Seong;Lee, Jong Koo;Lee, Joon Hyo;Hwang, Sang-wook
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.9 no.4
    • /
    • pp.397-403
    • /
    • 2020
  • In order to synchronize a remote system time to the reference time like Coordinated Universal Time (UTC), it is required to compare the time difference between the two clocks. The time comparison data may have some outliers and the time synchronization performance can be significantly degraded if the outliers are not removed. Therefore, it is required to employ an effective outlier detection algorithm for keeping high accurate system time. In this paper, an outlier detection method is presented for the time difference data of GNSS time transfer receivers. The time difference data between the system time and the GNSS usually have slopes because the remote system clock is under free running until synchronized to the reference clock time. For investigating the outlier detection performance of the proposed algorithm, simulations are performed by using the time difference data of a GNSS time transfer receiver corrected to a free running Cesium clock with intentionally inserted outliers. From the simulation, it is investigated that the proposed algorithm can effectively detect the inserted outliers while conventional methods such as modified Z-score and adjusted boxplot cannot. Furthermore, it is also observed that the synchronization performance can be degraded to more than 15% with 20 outliers compared to that of original data without outliers.

A Score test for Detection of Outliers in Nonlinear Regression

  • Kahng, Myung-Wook
    • Journal of the Korean Statistical Society
    • /
    • v.22 no.2
    • /
    • pp.201-208
    • /
    • 1993
  • Given the specific mean shift outlier model, the score test for multiple outliers in nonlinear regression is discussed as an alternative to the likelihood ratio test. The geometric interpretation of the score statistic is also presented.

  • PDF

On Sensitivity Analysis in Principal Component Regression

  • Kim, Soon-Kwi;Park, Sung H.
    • Journal of the Korean Statistical Society
    • /
    • v.20 no.2
    • /
    • pp.177-190
    • /
    • 1991
  • In this paper, we discuss and review various measures which have been presented for studying outliers. high-leverage points, and influential observations when principal component regression is adopted. We suggest several diagnostics measures when principal component regression is used. A numerical example is illustrated. Some individual data points may be flagged as outliers, high-leverage point, or influential points.

  • PDF

The System for Checking Multivariate Normality and Outliers

  • 강명래;최용석
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2000.11a
    • /
    • pp.253-255
    • /
    • 2000
  • 다변량분석 기법을 사용하기 위해서는 자료가 정규성(normality)가정을 만족해야한다. 본 연구에서는 GUI(graphic user interface)환경 하에서 일변량(univariate)과 다변량자료(multivariate data)의 정규성검정, 이상치(outliers)제거 및 변수변환(variable transformation)을 지원하는 시스템을 구축하여 사용자들이 보다 편리하게 사용할 수 있음을 소개 하고자 한다.

  • PDF