• Title/Summary/Keyword: Outliers

Search Result 669, Processing Time 0.02 seconds

Robust Response Transformation Using Outlier Detection in Regression Model (회귀모형에서 이상치 검색을 이용한 로버스트 변수변환방법)

  • Seo, Han-Son;Lee, Ga-Yoen;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.1
    • /
    • pp.205-213
    • /
    • 2012
  • Transforming response variable is a general tool to adapt data to a linear regression model. However, it is well known that response transformations in linear regression are very sensitive to one or a few outliers. Many methods have been suggested to develop transformations that will not be influenced by potential outliers. Recently Cheng (2005) suggested to using a trimmed likelihood estimator based on the idea of the least trimmed squares estimator(LTS). However, the method requires presetting the number of outliers and needs many computations. A new method is proposed, that can solve the problems addressed and improve the robustness of the estimates. The method uses a stepwise procedure, suggested by Hadi and Simonoff (1993), to detect outliers that determine response transformations.

Robust multiple imputation method for missings with boundary and outliers (한계와 이상치가 있는 결측치의 로버스트 다중대체 방법)

  • Park, Yousung;Oh, Do Young;Kwon, Tae Yeon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.6
    • /
    • pp.889-898
    • /
    • 2019
  • The problem of missing value imputation for variables in surveys that include item missing becomes complicated if outliers and logical boundary conditions between other survey items cannot be ignored. If there are outliers and boundaries in a variable including missing values, imputed values based on previous regression-based imputation methods are likely to be biased and not meet boundary conditions. In this paper, we approach these difficulties in imputation by combining various robust regression models and multiple imputation methods. Through a simulation study on various scenarios of outliers and boundaries, we find and discuss the optimal combination of robust regression and multiple imputation method.

Outlier-Object Detection Using an Image Pair Based on Regression Analysis: Noise Variance Estimation and Performance Analysis (영상 쌍에서 회귀분석에 기초한 이상 물체 검출: 잡음분산의 추정과 성능 분석)

  • Kim, Dong-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.5
    • /
    • pp.25-34
    • /
    • 2008
  • By comparing two images, which are captured with the same scene at different time, we can detect a set of outliers, such as occluding objects due to moving vehicles. To reduce the influence from the different intensity properties of the images, an intensity compensation scheme, which is based on the polynomial regression model, is employed. For an accurate detection of outliers alleviating the influence from a set of outliers, a simple technique that reruns the regression is employed. In this paper, an algorithm that iteratively reruns the regression is theoretically analyzed by observing the convergence property of the estimates of the noise variance. Using a correction constant for the estimate of the noise variance is proposed. The correction enables the detection algorithm robust to the choice of thresholds for selecting outliers. Numerical analysis using both synthetic and Teal images are also shown in this paper to show the robust performance of the detection algorithm.

Robust estimation of sparse vector autoregressive models (희박 벡터 자기 회귀 모형의 로버스트 추정)

  • Kim, Dongyeong;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.5
    • /
    • pp.631-644
    • /
    • 2022
  • This paper considers robust estimation of the sparse vector autoregressive model (sVAR) useful in high-dimensional time series analysis. First, we generalize the result of Xu et al. (2008) that the adaptive lasso indeed has robustness in sVAR as well. However, adaptive lasso method in sVAR performs poorly as the number and sizes of outliers increases. Therefore, we propose new robust estimation methods for sVAR based on least absolute deviation (LAD) and Huber estimation. Our simulation results show that our proposed methods provide more accurate estimation in turn showed better forecasting performance when outliers exist. In addition, we applied our proposed methods to power usage data and confirmed that there are unignorable outliers and robust estimation taking such outliers into account improves forecasting.

Performance Evaluation of Battery Remaining Time Estimation Methods According to Outlier Data Processing Policies in Mobile Devices (모바일 기기에서 이상치 데이터 처리 정책에 따른 배터리 잔여 시간 예측 기법의 평가)

  • Tak, Sungwoo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.7
    • /
    • pp.1078-1090
    • /
    • 2022
  • The distribution patterns of battery usage time data per battery level are able to affect the performance of estimating battery remaining time in mobile devices. Outliers may mainly affect the estimation performance of statistical regression methods. In this paper, we propose a software framework that detects and processes outliers to improve the estimation performance of statistical regression methods. The proposed framework first detects outliers that degrade the estimation performance. The proposed framework replaces outliers with smoothed data. The difference between an outlier and its replaced data will be properly distributed into individual data. Finally, individual data are reinforced to improve the estimation performance. The numerical results obtained by experimenting the proposed framework confirmed that it yielded good performance of estimating battery remaining time.

Study on Lifelog Anomaly Detection using VAE-based Machine Learning Model (VAE(Variational AutoEncoder) 기반 머신러닝 모델을 활용한 체중 라이프로그 이상탐지에 관한 연구)

  • Kim, Jiyong;Park, Minseo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.91-98
    • /
    • 2022
  • Lifelog data continuously collected through a wearable device may contain many outliers, so in order to improve data quality, it is necessary to find and remove outliers. In general, since the number of outliers is less than the number of normal data, a class imbalance problem occurs. To solve this imbalance problem, we propose a method that applies Variational AutoEncoder to outliers. After preprocessing the outlier data with proposed method, it is verified through a number of machine learning models(classification). As a result of verification using body weight data, it was confirmed that the performance was improved in all classification models. Based on the experimental results, when analyzing lifelog body weight data, we propose to apply the LightGBM model with the best performance after preprocessing the data using the outlier processing method proposed in this study.

Outlier Detection of Autoregressive Models Using Robust Regression Estimators (로버스트 추정법을 이용한 자기상관회귀모형에서의 특이치 검출)

  • Lee Dong-Hee;Park You-Sung;Kim Kee-Whan
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.2
    • /
    • pp.305-317
    • /
    • 2006
  • Outliers adversely affect model identification, parameter estimation, and forecast in time series data. In particular, when outliers consist of a patch of additive outliers, the current outlier detection procedures suffer from the masking and swamping effects which make them inefficient. In this paper, we propose new outlier detection procedure based on high breakdown estimators, called as the dual robust filtering. Empirical and simulation studies in the autoregressive model with orders p show that the proposed procedure is effective.

A Graphical Method for Evaluating the Effect of Outliers in One- and Two-Variate Data (일변량 및 이변량 자료에 대하여 특이값의 영향을 평가하기 위한 그래픽 방법)

  • Jang, Dae-Heung
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.2
    • /
    • pp.395-407
    • /
    • 2007
  • Outliers distort many measures for data analysis. We can propose dandelion seed plot as a graphical tool for evaluating the effect of outliers in one-and two-variate data. We can draw mean-variance dandelion seed plots using linked curves which are made by changing weights from 1 to 0 for each datum. Similarly we can also draw covariance-correlation-coefficient dandelion seed plots. This graphical method can be a useful tool for elementary statistics education in college.

Firework plot for evaluating the impact of influential observations in multi-response surface methodology (다반응 반응표면분석에서 특이값의 영향을 평가하기 위한 불꽃그림)

  • Kim, Sang Ik;Jang, Dae-Heung
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.97-108
    • /
    • 2018
  • It has been routine practice in regression analysis to check the validity of the assumed model by the use of regression diagnostics tools. Outliers and influential observations often distort the regression output in an undesired manner. Jang and Anderson-Cook (Quality and Reliability Engineering International, 30, 1409-1425, 2014) proposed a graphical method (called a firework plot) so that there could be an exploratory visualization of the trace of the impact of the possible outliers and influential observations on individual regression coefficients and the overall residual sum of the squares measure. This paper further extends a graphical approach to a multi-response surface methodology problem.

Comparative Analysis on the Outlier Data of Each Parameter in Automatic Water Quality Monitoring Networks (수질자동측정망 자료의 항목별 이상치 비교 분석)

  • Lim, Byungjin;Hong, Eunyoung;Yeon, Insung
    • Journal of Korean Society on Water Environment
    • /
    • v.26 no.4
    • /
    • pp.700-706
    • /
    • 2010
  • Along the 4 major rivers in korea, there are automatic water quality monitoring (AWQM) stations to immediately respond to any pollution incident. Real-time data (temperature, DO, pH, EC and TOC) collected at each station were statistically treated to exclude outliers and keep valid data using Dixon's test and Discordance test. These applied methods were compared in terms of the number of the outliers sorted out. There was no significant difference between these methods. On the other hand, more outliers were sorted out from EC and TOC data, comparing with other water quality items. EC data did not show partly any variation for a long time at H station. If measured signal does not exceed ${\pm}0.001mS/cm$ from the sectional mean, the signal should be treated as normal data. Therefore, another routine was added to the data screening system, some data which were removed as outlier were restored.