• 제목/요약/키워드: Outliers

검색결과 655건 처리시간 0.028초

Process modeling using artificial neural network in the presence of outliers

  • 고영철;박화규;봉복준;손주찬;왕지남
    • 한국경영과학회:학술대회논문집
    • /
    • 한국경영과학회 1997년도 추계학술대회발표논문집; 홍익대학교, 서울; 1 Nov. 1997
    • /
    • pp.177-180
    • /
    • 1997
  • Outliers, unexpected extraordinary observations that look discordant from most observation in a data set are commonplace in various kinds of data analysis. Since the effect of outliers on model identification could be serious, the aim of this paper is to present some ways of handling outliers in given data set and to specify a model in the presence of outliers. A procedure based on neural network which identifies outliers, removes their effects, and specifies a model for the underlying process is proposed. In contrast with traditional parametric methods requiring to estimate the model's structure and parameters before detecting outliers, the proposed procedure is a nonparametric method without the estimation of model's structure and parameters before handling outliers and could be applied for real problems in the presence of outliers. The proposed methodology is performed as followings. Firstly, outliers are detected and the detected outliers replace the prediction values using outliers detection neural network. The data set removing the effect of outliers is retraining using neural network. Therefore the effects of outliers are removed and the modeling precision can be improved. Experimental results show that the proposed method is suitable for predicting data set in the presence of outliers.

  • PDF

The Identification Of Multiple Outliers

  • Park, Jin-Pyo
    • Journal of the Korean Data and Information Science Society
    • /
    • 제11권2호
    • /
    • pp.201-215
    • /
    • 2000
  • The classical method for regression analysis is the least squares method. However, if the data contain significant outliers, the least squares estimator can be broken down by outliers. To remedy this problem, the robust methods are important complement to the least squares method. Robust methods down weighs or completely ignore the outliers. This is not always best because the outliers can contain some very important information about the population. If they can be detected, the outliers can be further inspected and appropriate action can be taken based on the results. In this paper, I propose a sequential outlier test to identify outliers. It is based on the nonrobust estimate and the robust estimate of scatter of a robust regression residuals and is applied in forward procedure, removing the most extreme data at each step, until the test fails to detect outliers. Unlike other forward procedures, the present one is unaffected by swamping or masking effects because the statistics is based on the robust regression residuals. I show the asymptotic distribution of the test statistics and apply the test to several real data and simulated data for the test to be shown to perform fairly well.

  • PDF

Simultaneous Identification of Multiple Outliers and High Leverage Points in Linear Regression

  • Rahmatullah Imon, A.H.M.;Ali, M. Masoom
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권2호
    • /
    • pp.429-444
    • /
    • 2005
  • The identification of unusual observations such as outliers and high leverage points has drawn a great deal of attention for many years. Most of these identifications techniques are based on case deletion that focuses more on the outliers than the high leverage points. But residuals together with leverage values may cause masking and swamping for which a good number of unusual observations remain undetected in the presence of multiple outliers and multiple high leverage points. In this paper we propose a new procedure to identify outliers and high leverage points simultaneously. We suggest an additive form of the residuals and the leverages that gives almost an equal focus on outliers and leverages. We analyzed several well-referred data set and discover few outliers and high leverage points that were undetected by the existing diagnostic techniques.

  • PDF

Weight Reduction Method for Outlier in Survey Sampling

  • Kim Jin
    • Communications for Statistical Applications and Methods
    • /
    • 제13권1호
    • /
    • pp.19-27
    • /
    • 2006
  • Outliers in survey are a perennial problem for applied survey statisticians to estimate the total or mean of population. The influence of outliers is more increasing as they have large weights in survey sampling. Many techniques have been studied to lower the impact of outliers on sample survey estimates. Outliers can be downweighted by winsorization or reducing the weight of outliers. The weight reduction is more reasonable than replacing one outlier by one value of non-outliers, because it has at least one unit. In this paper, we suggest the square root transformation of weight as the weight reduction method. We show this method is efficient with real data, and it's also easy to apply in practical affairs.

잠재적 이상치군에 대한 검정 (Outlier tests on potential outliers)

  • 서한손
    • 응용통계연구
    • /
    • 제30권1호
    • /
    • pp.159-167
    • /
    • 2017
  • 일반적으로 잠재적 이상치군은 검정과정을 통해 최종적으로 이상치 여부를 판단하지만 검정절차를 생략하거나 모의실험에 의해 계산된 유의값을 기반으로 검정을 수행하는 이상치 탐지법들도 있다. 본 논문에서는 가면화나 수렁화현상을 피하기 위하여 이상치후보군에 속한 개별 관찰치를 검정하지 않고 이상치후보군의 부분집합들을 검정하는 절차를 제안한다. 제안된 방법의 활용을 보여주는 예제와 다른 방법과의 검정력 비교를 위한 모의실험 결과가 제시된다.

Joint Estimation of the Outliers Effect and the Model Parameters in ARMA Process

  • Lee, Kwang-Ho;Shin, Hye-Jung
    • Journal of the Korean Data and Information Science Society
    • /
    • 제6권2호
    • /
    • pp.41-50
    • /
    • 1995
  • In this paper, an iterative procedure, which detects the location of the outliers and the joint estimates of the outliers effects and the model parameters in the autoregressive moving average model with two types of outliers, is proposed. The performance of the procedure is compared with the one in Chen and Liu(1993) through the Monte Carlo simulation. The proposed procedure is very robust in the sense that applies the procedures to the stationary time series model with any types of outliers.

  • PDF

Improving the Quality of Response Surface Analysis of an Experiment for Coffee-Supplemented Milk Beverage: I. Data Screening at the Center Point and Maximum Possible R-Square

  • Rheem, Sungsue;Oh, Sejong
    • 한국축산식품학회지
    • /
    • 제39권1호
    • /
    • pp.114-120
    • /
    • 2019
  • Response surface methodology (RSM) is a useful set of statistical techniques for modeling and optimizing responses in research studies of food science. As a design for a response surface experiment, a central composite design (CCD) with multiple runs at the center point is frequently used. However, sometimes there exist situations where some among the responses at the center point are outliers and these outliers are overlooked. Since the responses from center runs are those from the same experimental conditions, there should be no outliers at the center point. Outliers at the center point ruin statistical analysis. Thus, the responses at the center point need to be looked at, and if outliers are observed, they have to be examined. If the reasons for the outliers are not errors in measuring or typing, such outliers need to be deleted. If the outliers are due to such errors, they have to be corrected. Through a re-analysis of a dataset published in the Korean Journal for Food Science of Animal Resources, we have shown that outlier elimination resulted in the increase of the maximum possible R-square that the modeling of the data can obtain, which enables us to improve the quality of response surface analysis.

The Forward Sequential Procedure for the Identifying Multiple Outliers in Linear Regression

  • Park, Jin-Pyo
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권4호
    • /
    • pp.1053-1066
    • /
    • 2005
  • In this paper we consider the problem of identifying and testing outliers in linear regression. First we consider the use of the so-called scale ratio tests for testing the null hypothesis of no outliers. This test is based on the ratio of two residual scale estimates. We show the asymptotic distribution of the test statistics and investigate its properties. Next we consider the problem of identifying the outliers. A forward sequential procedure using the suggested test is proposed. The new method is compared with classical procedure in the real data example. Unlike other forward procedures, the present one is unaffected by masking and swamping effects because the test statistic is based on robust scale estimate.

  • PDF

이상치가 존재하는 단순회귀모형에서 Rice 추정량에 관해서 (On Rice Estimator in Simple Regression Models with Outliers)

  • 박천건
    • 응용통계연구
    • /
    • 제26권3호
    • /
    • pp.511-520
    • /
    • 2013
  • 이상치가 존재하는 회귀모형에서 이상치를 탐색하거나 로버스트 추정량에 대한 연구는 매우 중요하다. 이러한 연구는 leave-one-out를 이용하여 회귀계수를 추정하고 잔차를 이용하여 오차 분산을 추정하여 이상치를 탐색하는데 있다. 본 연구는 회귀모형에서 회귀계수를 추정하지 않고 오차 분산을 추정할 수 있는 Rice 추정량의 적용을 소개한 것이다. 특히, 단순회귀모형에서 이상치의 유무에 따라 Rice 추정량의 통계적 성질을 비교하고 이상치 탐색에 있어 어떤 장점이 있는지를 탐색한 연구이다.

시계열에서의 연속이상치가 예측에 미치는 영향 (The effect of patchy outliers in time series forecasting)

  • 이재준;편영숙
    • 응용통계연구
    • /
    • 제9권1호
    • /
    • pp.125-137
    • /
    • 1996
  • 시계열 자료는 흔히 반복되지 않는 비정상적인 사건의 영향으로 이상치를 포함한다. 시계열 자료는 관측치들 사이에 종속구조를 갖기 때문에, 이상치의 영향은 다른 통계적 분석에서 보다 더 심각할 수 있다. 본 논문에서는 연속이상치가 예측에 미치는 영향을 파악하는 데에 촛점을 두었다. 특히, l 시점 후 예측오차의 평균제곱의 증가량을 유도하고, 이 증가량으로 연속이상치가 예측에 미치는 영향을 측정하였다. 일반적으로, 연속이상치가 예측 원점에서 아주 가까운 시점에서 발생하지 않았으며 그 증가량은 크지 않음을 밝히고, 실제 자료를 분석하여 확인하였다.

  • PDF