• Title/Summary/Keyword: Outlier Data

Search Result 407, Processing Time 0.022 seconds

Multiple Imputation Reducing Outlier Effect using Weight Adjustment Methods (가중치 보정을 이용한 다중대체법)

  • Kim, Jin-Young;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.4
    • /
    • pp.635-647
    • /
    • 2013
  • Imputation is a commonly used method to handle missing survey data. The performance of the imputation method is influenced by various factors, especially an outlier. The removal of the outlier in a data set is a simple and effective approach to reduce the effect of an outlier. In this paper in order to improve the precision of multiple imputation, we study a imputation method which reduces the effect of outlier using various weight adjustment methods that include the removal of an outlier method. The regression method in PROC/MI in SAS is used for multiple imputation and the obtained final adjusted weight is used as a weight variable to obtain the imputed values. Simulation studies compared the performance of various weight adjustment methods and Monthly Labor Statistic data is used for real data analysis.

On the Efficiency of Outlier Cleaners in Spatial Data Analysis (공간통계분석에서 이상점 수정방법의 효율성비교)

  • 이진희;신기일
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.2
    • /
    • pp.327-336
    • /
    • 2004
  • Many researchers have used the robust variogram to reduce the effect of outliers in spatial data analysis. Recently it is known that estimating the variogram after replacing outliers is more efficient. In this paper, we suggest a new data cleaner for geostatistic data analysis and compare the efficiency of outlier cleaners.

A Multiple Imputation for Reducing Outlier Effect (이상점 영향력 축소를 통한 무응답 대체법)

  • Kim, Man-Gyeom;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1229-1241
    • /
    • 2014
  • Most of sampling surveys have outliers and non-response missing values simultaneously. In that case, due to the effect of outliers, the result of imputation is not good enough to meet a given precision. To overcome this situation, outlier treatment should be conducted before imputation. In this paper in order for reducing the effect of outlier, we study outlier imputation methods and outlier weight adjustment methods. For the outlier detection, the method suggested by She and Owen (2011) is used. A small simulation study is conducted and for real data analysis, Monthly Labor Statistic and Briquette Consumption Survey Data are used.

Asymptotic Properties of Outlier Tests in Nonlinear Regression

  • Kahng, Myung-Wook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.1
    • /
    • pp.205-211
    • /
    • 2006
  • For a linear regression model, the necessary and sufficient condition for the asymptotic consistency of the outlier test statistic is known. An analogous condition for the nonlinear regression model is considered in this paper.

  • PDF

Estimations for a Uniform Scale Parameter in the Presence of a Half-Triangle Outlier

  • Lee, Chang-Soo;Kim, Kee-Hwan;Park, Yang-Woo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.3
    • /
    • pp.959-965
    • /
    • 2008
  • We shall propose several estimators for the scale parameter in a uniform distribution with the presence of a half-triangle outlier, and obtain mean squared errors(MSE's) for their proposed estimators. And we shall compare numerically efficiencies for proposed several estimators of the scale parameter in a uniform distribution with the presence of a half-triangle outlier in the small sample sizes.

  • PDF

Outlier Detection in Random Effects Model Using Fractional Bayes Factor

  • Chung, Younshik
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.1
    • /
    • pp.141-150
    • /
    • 2000
  • In this paper we propose a method of computing Bayes factor to detect an outlier in a random effects model. When no information is available and hence improper noninformative priors should be used Bayes factor includes the unspecified constants and has complicated computational burden. To solve this problem we use the fractional Bayes factor (FBF) of O-Hagan(1995) and the generalized Savage0-Dickey density ratio of Verdinelli and Wasserman (1995) The proposed method is applied to outlier deterction problem We perform a simulation of the proposed approach with a simulated data set including an outlier and also analyze a real data set.

  • PDF

Dam Sensor Outlier Detection using Mixed Prediction Model and Supervised Learning

  • Park, Chang-Mok
    • International journal of advanced smart convergence
    • /
    • v.7 no.1
    • /
    • pp.24-32
    • /
    • 2018
  • An outlier detection method using mixed prediction model has been described in this paper. The mixed prediction model consists of time-series model and regression model. The parameter estimation of the prediction model was performed using supervised learning and a genetic algorithm is adopted for a learning method. The experiments were performed in artificial and real data set. The prediction performance is compared with the existing prediction methods using artificial data. Outlier detection is conducted using the real sensor measurements in a dam. The validity of the proposed method was shown in the experiments.

Density-based Outlier Detection in Multi-dimensional Datasets

  • Wang, Xite;Cao, Zhixin;Zhan, Rongjuan;Bai, Mei;Ma, Qian;Li, Guanyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3815-3835
    • /
    • 2022
  • Density-based outlier detection is one of the hot issues in data mining. A point is determined as outlier on basis of the density of points near them. The existing density-based detection algorithms have high time complexity, in order to reduce the time complexity, a new outlier detection algorithm DODMD (Density-based Outlier Detection in Multidimensional Datasets) is proposed. Firstly, on the basis of ZH-tree, the concept of micro-cluster is introduced. Each leaf node is regarded as a micro-cluster, and the micro-cluster is calculated to achieve the purpose of batch filtering. In order to obtain n sets of approximate outliers quickly, a greedy method is used to calculate the boundary of LOF and mark the minimum value as LOFmin. Secondly, the outliers can filtered out by LOFmin, the real outliers are calculated, and then the result set is updated to make the boundary closer. Finally, the accuracy and efficiency of DODMD algorithm are verified on real dataset and synthetic dataset respectively.

Outlier prediction in sensor network data using periodic pattern (주기 패턴을 이용한 센서 네트워크 데이터의 이상치 예측)

  • Kim, Hyung-Il
    • Journal of Sensor Science and Technology
    • /
    • v.15 no.6
    • /
    • pp.433-441
    • /
    • 2006
  • Because of the low power and low rate of a sensor network, outlier is frequently occurred in the time series data of sensor network. In this paper, we suggest periodic pattern analysis that is applied to the time series data of sensor network and predict outlier that exist in the time series data of sensor network. A periodic pattern is minimum period of time in which trend of values in data is appeared continuous and repeated. In this paper, a quantization and smoothing is applied to the time series data in order to analyze the periodic pattern and the fluctuation of each adjacent value in the smoothed data is measured to be modified to a simple data. Then, the periodic pattern is abstracted from the modified simple data, and the time series data is restructured according to the periods to produce periodic pattern data. In the experiment, the machine learning is applied to the periodic pattern data to predict outlier to see the results. The characteristics of analysis of the periodic pattern in this paper is not analyzing the periods according to the size of value of data but to analyze time periods according to the fluctuation of the value of data. Therefore analysis of periodic pattern is robust to outlier. Also it is possible to express values of time attribute as values in time period by restructuring the time series data into periodic pattern. Thus, it is possible to use time attribute even in the general machine learning algorithm in which the time series data is not possible to be learned.