• Title/Summary/Keyword: Outliers test

Search Result 114, Processing Time 0.028 seconds

Outlier Detection in Growth Curve Model

  • Shim, Kyu-Bark
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.313-323
    • /
    • 2003
  • For the growth curve model with arbitrary covariance structure, known as unstructured covariance matrix, the problems of detecting outliers are discussed in this paper. In order to detect outliers in the growth curve model, the test statistics using U-distribution is established. After detecting outliers in growth curve model, we test homo and/or hetero-geneous covariance matrices using PSR Quasi-Bayes Criterion. For illustration, one numerical example is discussed, which compares between before and after outlier deleting.

  • PDF

A Study on Applications of Regression Diagnostic Method to Technometrics, and the Statistical Quality Control

  • Kim, Soon-Kwi
    • Journal of Korean Society for Quality Management
    • /
    • v.21 no.1
    • /
    • pp.55-64
    • /
    • 1993
  • This article is concerned with procedures for detecting one or more outliers or influential observations in a linear regression model. A test procedure, based on recursive residuals is proposed and developed The power of the test procedure to identify one or more outliers is investigated through simulation, and its relevance to the number and configuration of the outlier.

  • PDF

Test for an Outlier in Multivariate Regression with Linear Constraints

  • Kim, Myung-Geun
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.473-478
    • /
    • 2002
  • A test for a single outlier in multivariate regression with linear constraints on regression coefficients using a mean shift model is derived. It is shown that influential observations based on case-deletions in testing linear hypotheses are determined by two types of outliers that are mean shift outliers with or without linear constraints, An illustrative example is given.

Statistical Outliers in Florida Counties at the Presidential Election 2000 (2000년 미국대선 플로리다주의 투표결과 분석)

  • 김현철
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.1
    • /
    • pp.21-32
    • /
    • 2002
  • We searched out in the votes data of the State of Florida at presidential election 2000. We used a multivariate regression analysis. We got there were several outliers including Palm Beach County. It means that we should analyze the number of disqualified ballots which were double-punched as well as the votes, to insist the " Butterfly Ballot" made Palm Beach outlier.

A Robust Vector Quantization Method against Distortion Outlier and Source Mismatch (이상 신호왜곡과 소스 불일치에 강인한 벡터 양자화 방법)

  • Noh, Myung-Hoon;Kim, Moo-Young
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.3
    • /
    • pp.74-80
    • /
    • 2012
  • In resolution-constrained quantization, the size of Voronoi cell varies depending on probability density function of the input data, which causes large amount of distortion outliers. We propose a vector quantization method that reduces distortion outliers by combining the generalized Lloyd algorithm (GLA) and the cell-size constrained vector quantization (CCVQ) scheme. The training data are divided into the inside and outside regions according to the size of Voronoi cell, and consequently CCVQ and GLA are applied to each region, respectively. As CCVQ is applied to the densely populated region of the source instead of GLA, the number of centroids for the outside region can be increased such that distortion outliers can be decreased. In real-world environment, source mismatch between training and test data is inevitable. For the source mismatch case, the proposed algorithm improves performance in terms of average distortion and distortion outliers.

Impact of Outliers on the Statistical Measures of the Environmental Monitoring Data in Busan Coastal Sea (이상자료가 연안 환경자료의 통계 척도에 미치는 영향)

  • Cho, Hong-Yeon;Lee, Ki-Seop;Ahn, Soon-Mo
    • Ocean and Polar Research
    • /
    • v.38 no.2
    • /
    • pp.149-159
    • /
    • 2016
  • The statistical measures of the coastal environmental data are used in a variety of statistical inferences, hypothesis tests, and data-driven modeling. If the measures are biased, then the statistical estimations and models may also be biased and this potential for bias is great when data contain some outliers defined as extraordinary large or small data values. This study aims to suggest more robust statistical measures as alternatives to more commonly used measures and to assess the performance these robust measures through a quantitative evaluation of more typical measures, such as in terms of locations, spreads, and shapes, with regard to environmental monitoring data in the Busan coastal sea. The detection of outliers within the data was carried out on the basis of Rosner's test. About 5-10% of the nutrient data were found to contain outliers based on Rosner's test. After removal (zero-weighting) of the outliers in the data sets, the relative change ratios of the mean and standard deviation between before and after outlier-removal conditions revealed the figures 13 and 33%, respectively. The variation magnitudes of skewness and kurtosis are 1.36 and 8.11 in a decreasing trend, respectively. On the other hand, the change ratios for more robust measures regarding the mean and standard deviation are 3.7-10.5%, and the variation magnitudes of robust skewness and kurtosis are about only 2-4% of the magnitude of the non-robust measures. The robust measures can be regarded as outlier-resistant statistical measures based on the relatively small changes in the scenarios before and after outlier removal conditions.

More informative sequential probability ratio test (정보 전달이 보다 효과적인 변형된 축차확률비검정)

  • 박노진
    • The Korean Journal of Applied Statistics
    • /
    • v.9 no.2
    • /
    • pp.109-117
    • /
    • 1996
  • We introduce the more informative sequential probability ratio test(SPRT) than currently used SPRT. Though the proposed SPRT shares similar mathematical properties with the ordinary SPRT, it is less affected by the outliers and even it indicates possible existence of such outliers. Futhermore, it responds to the changes among observations more quickly than the ordinary SPRT.

  • PDF

Robust inference for linear regression model based on weighted least squares

  • Park, Jin-Pyo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.2
    • /
    • pp.271-284
    • /
    • 2002
  • In this paper we consider the robust inference for the parameter of linear regression model based on weighted least squares. First we consider the sequential test of multiple outliers. Next we suggest the way to assign a weight to each observation $(x_i,\;y_i)$ and recommend the robust inference for linear model. Finally, to check the performance of confidence interval for the slope using proposed method, we conducted a Monte Carlo simulation and presented some numerical results and examples.

  • PDF

MULTIPLE OUTLIER DETECTION IN LOGISTIC REGRESSION BY USING INFLUENCE MATRIX

  • Lee, Gwi-Hyun;Park, Sung-Hyun
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.4
    • /
    • pp.457-469
    • /
    • 2007
  • Many procedures are available to identify a single outlier or an isolated influential point in linear regression and logistic regression. But the detection of influential points or multiple outliers is more difficult, owing to masking and swamping problems. The multiple outlier detection methods for logistic regression have not been studied from the points of direct procedure yet. In this paper we consider the direct methods for logistic regression by extending the $Pe\tilde{n}a$ and Yohai (1995) influence matrix algorithm. We define the influence matrix in logistic regression by using Cook's distance in logistic regression, and test multiple outliers by using the mean shift model. To show accuracy of the proposed multiple outlier detection algorithm, we simulate artificial data including multiple outliers with masking and swamping.

Comparative Analysis on the Outlier Data of Each Parameter in Automatic Water Quality Monitoring Networks (수질자동측정망 자료의 항목별 이상치 비교 분석)

  • Lim, Byungjin;Hong, Eunyoung;Yeon, Insung
    • Journal of Korean Society on Water Environment
    • /
    • v.26 no.4
    • /
    • pp.700-706
    • /
    • 2010
  • Along the 4 major rivers in korea, there are automatic water quality monitoring (AWQM) stations to immediately respond to any pollution incident. Real-time data (temperature, DO, pH, EC and TOC) collected at each station were statistically treated to exclude outliers and keep valid data using Dixon's test and Discordance test. These applied methods were compared in terms of the number of the outliers sorted out. There was no significant difference between these methods. On the other hand, more outliers were sorted out from EC and TOC data, comparing with other water quality items. EC data did not show partly any variation for a long time at H station. If measured signal does not exceed ${\pm}0.001mS/cm$ from the sectional mean, the signal should be treated as normal data. Therefore, another routine was added to the data screening system, some data which were removed as outlier were restored.