• Title/Summary/Keyword: regression outlier

Search Result 116, Processing Time 0.022 seconds

Accuracy of Multiple Outlier Tests in Nonlinear Regression

  • Kahng, Myung-Wook
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.1
    • /
    • pp.131-136
    • /
    • 2011
  • The original Bates-Watts framework applies only to the complete parameter vector. Thus, guidelines developed in that framework can be misleading when the adequacy of the linear approximation is very different for different subsets. The subset curvature measures appear to be reliable indicators of the adequacy of linear approximation for an arbitrary subset of parameters in nonlinear models. Given the specific mean shift outlier model, the standard approaches to obtaining test statistics for outliers are discussed. The accuracy of outlier tests is investigated using subset curvatures.

Dam Sensor Outlier Detection using Mixed Prediction Model and Supervised Learning

  • Park, Chang-Mok
    • International journal of advanced smart convergence
    • /
    • v.7 no.1
    • /
    • pp.24-32
    • /
    • 2018
  • An outlier detection method using mixed prediction model has been described in this paper. The mixed prediction model consists of time-series model and regression model. The parameter estimation of the prediction model was performed using supervised learning and a genetic algorithm is adopted for a learning method. The experiments were performed in artificial and real data set. The prediction performance is compared with the existing prediction methods using artificial data. Outlier detection is conducted using the real sensor measurements in a dam. The validity of the proposed method was shown in the experiments.

Bayesian Outlier Detection in Regression Model

  • Younshik Chung;Kim, Hyungsoon
    • Journal of the Korean Statistical Society
    • /
    • v.28 no.3
    • /
    • pp.311-324
    • /
    • 1999
  • The problem of 'outliers', observations which look suspicious in some way, has long been one of the most concern in the statistical structure to experimenters and data analysts. We propose a model for an outlier problem and also analyze it in linear regression model using a Bayesian approach. Then we use the mean-shift model and SSVS(George and McCulloch, 1993)'s idea which is based on the data augmentation method. The advantage of proposed method is to find a subset of data which is most suspicious in the given model by the posterior probability. The MCMC method(Gibbs sampler) can be used to overcome the complicated Bayesian computation. Finally, a proposed method is applied to a simulated data and a real data.

  • PDF

Outlier Detection and Treatment for the Conversion of Chemical Oxygen Demand to Total Organic Carbon (화학적산소요구량의 총유기탄소 변환을 위한 이상자료의 탐지와 처리)

  • Cho, Beom Jun;Cho, Hong Yeon;Kim, Sung
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.26 no.4
    • /
    • pp.207-216
    • /
    • 2014
  • Total organic carbon (TOC) is an important indicator used as an direct biological index in the research field of the marine carbon cycle. It is possible to produce the sufficient TOC estimation data by using the Chemical Oxygen Demand(COD) data because the available TOC data is relatively poor than the COD data. The outlier detection and treatment (removal) should be carried out reasonably and objectively because the equation for a COD-TOC conversion is directly affected the TOC estimation. In this study, it aims to suggest the optimal regression model using the available salinity, COD, and TOC data observed in the Korean coastal zone. The optimal regression model is selected by the comparison and analysis on the changes of data numbers before and after removal, variation coefficients and root mean square (RMS) error of the diverse detection methods of the outlier and influential observations. According to research result, it is shown that a diagnostic case combining SIQR (Semi - Inter-Quartile Range) boxplot and Cook's distance method is most suitable for the outlier detection. The optimal regression function is estimated as the TOC(mg/L) = $0.44{\cdot}COD(mg/L)+1.53$, then determination coefficient is showed a value of 0.47 and RMS error is 0.85 mg/L. The RMS error and the variation coefficients of the leverage values are greatly reduced to the 31% and 80% of the value before the outlier removal condition. The method suggested in this study can provide more appropriate regression curve because the excessive impacts of the outlier frequently included in the COD and TOC monitoring data is removed.

A Multiple Imputation for Reducing Outlier Effect (이상점 영향력 축소를 통한 무응답 대체법)

  • Kim, Man-Gyeom;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1229-1241
    • /
    • 2014
  • Most of sampling surveys have outliers and non-response missing values simultaneously. In that case, due to the effect of outliers, the result of imputation is not good enough to meet a given precision. To overcome this situation, outlier treatment should be conducted before imputation. In this paper in order for reducing the effect of outlier, we study outlier imputation methods and outlier weight adjustment methods. For the outlier detection, the method suggested by She and Owen (2011) is used. A small simulation study is conducted and for real data analysis, Monthly Labor Statistic and Briquette Consumption Survey Data are used.

An Outlier Detection Method in Penalized Spline Regression Models (벌점 스플라인 회귀모형에서의 이상치 탐지방법)

  • Seo, Han Son;Song, Ji Eun;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.4
    • /
    • pp.687-696
    • /
    • 2013
  • The detection and the examination of outliers are important parts of data analysis because some outliers in the data may have a detrimental effect on statistical analysis. Outlier detection methods have been discussed by many authors. In this article, we propose to apply Hadi and Simonoff's (1993) method to penalized spline a regression model to detect multiple outliers. Simulated data sets and real data sets are used to illustrate and compare the proposed procedure to a penalized spline regression and a robust penalized spline regression.

Compound Outlier Assessment and Verification for Multiple Field Monitoring Data (다수 계측 데이터에 대한 복합 이상치 평가 및 검증)

  • Jeon, Jesung
    • Journal of the Korean GEO-environmental Society
    • /
    • v.19 no.1
    • /
    • pp.5-14
    • /
    • 2018
  • All kinds of monitoring data in construction site could have outlier created from diverse cause. In this study generation technique of synthesis value, its regression, final outlier detection and assessment are conducted to distinct outlier data included in extensive time series dataset. Synthesis value having weight factor of correlation between a number of datasets consist of many monitoring data enable to detect outlier by increasing its correlation. Standard artificial dataset in which intentional outliers are inserted has been used for assessment of synthesis value technique. These results showed increase of detection accuracy for outlier and general tendency in case of having different time series models in common. Accuracy of outlier detection increased in case of using more dataset and showing similar time series pattern.

Procedures for Detecting Multiple Outliers in Linear Regression Using R

  • Kwon, Soon-Sun;Lee, Gwi-Hyun;Park, Sung-Hyun
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.13-17
    • /
    • 2005
  • In recent years, many people use R as a statistics system. R is frequently updated by many R project teams. We are interested in the method of multiple outlier detection and know that R is not supplied the method of multiple outlier detection. In this talk, we review these procedures for detecting multiple outliers and provide more efficient procedures combined with direct methods and indirect methods using R.

  • PDF

Modeling of Strength of High Performance Concrete with Artificial Neural Network and Mahalanobis Distance Outlier Detection Method (신경망 이론과 Mahalanobis Distance 이상치 탐색방법을 이용한 고강도 콘크리트 강도 예측 모델 개발에 관한 연구)

  • Hong, Jung-Eui
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.33 no.4
    • /
    • pp.122-129
    • /
    • 2010
  • High-performance concrete (HPC) is a new terminology used in concrete construction industry. Several studies have shown that concrete strength development is determined not only by the water-to-cement ratio but also influenced by the content of other concrete ingredients. HPC is a highly complex material, which makes modeling its behavior a very difficult task. This paper aimed at demonstrating the possibilities of adapting artificial neural network (ANN) to predict the comprresive strength of HPC. Mahalanobis Distance (MD) outlier detection method used for the purpose increase prediction ability of ANN. The detailed procedure of calculating Mahalanobis Distance (MD) is described. The effects of outlier compared with before and after artificial neural network training. MD outlier detection method successfully removed existence of outlier and improved the neural network training and prediction performance.

The Effect of Outliers in Regression Analysis (회귀 분석에서 이상치가 미치는 영향)

  • Kim, Kwang-Soo;Bae, Young-Ju;Lee, Jin-Gue
    • Journal of Korean Society for Quality Management
    • /
    • v.24 no.2
    • /
    • pp.158-171
    • /
    • 1996
  • Outlier is one that appears to deviate extremely from other data in collected data. Thus treatment of outlier is very important work, because it is to distort the meaning of whole data in its analysis and to reduce the accuracy and validity for adequate models. The aim of this paper is to present some ways of handling outliers in given data and to investigate the effect of the analysis result before and after outlier reject. As a variety of methods has been proposed, we sellect the linear regression analysis and two linear programming techniques and compare to each result.

  • PDF