• Title/Summary/Keyword: influential observation

Search Result 41, Processing Time 0.022 seconds

A Study on Detection of Influential Observations on A Subset of Regression Parameters in Multiple Regression

  • Park, Sung Hyun;Oh, Jin Ho
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.521-531
    • /
    • 2002
  • Various diagnostic techniques for identifying influential observations are mostly based on the deletion of a single observation. While such techniques can satisfactorily identify influential observations in many cases, they will not always be successful because of some mask effect. It is necessary, therefore, to develop techniques that examine the potentially influential effects of a subset of observations. The partial regression plots can be used to examine an influential observation for a single parameter in multiple linear regression. However, it is often desirable to detect influential observations for a subset of regression parameters when interest centers on a selected subset of independent variables. Thus, we propose a diagnostic measure which deals with detecting influential observations on a subset of regression parameters. In this paper, we propose a measure M, which can be effectively used for the detection of influential observations on a subset of regression parameters in multiple linear regression. An illustrated example is given to show how we can use the new measure M to identify influential observations on a subset of regression parameters.

A cautionary note on the use of Cook's distance

  • Kim, Myung Geun
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.3
    • /
    • pp.317-324
    • /
    • 2017
  • An influence measure known as Cook's distance has been used for judging the influence of each observation on the least squares estimate of the parameter vector. The distance does not reflect the distributional property of the change in the least squares estimator of the regression coefficients due to case deletions: the distribution has a covariance matrix of rank one and thus it has a support set determined by a line in the multidimensional Euclidean space. As a result, the use of Cook's distance may fail to correctly provide information about influential observations, and we study some reasons for the failure. Three illustrative examples will be provided, in which the use of Cook's distance fails to give the right information about influential observations or it provides the right information about the most influential observation. We will seek some reasons for the wrong or right provision of information.

Detecting Influential Observations on the Smoothing Parameter in Nonparametric Regression

  • Kim, Choong-Rak;Jeon, Jong-Woo
    • Journal of the Korean Statistical Society
    • /
    • v.24 no.2
    • /
    • pp.495-506
    • /
    • 1995
  • We present formula for detecting influential observations on the smoothing parameter in smoothing spline. Further, we express them as functions of basic building blocks such as residuals and leverage, and compare it with the local influence approach by Thomas (1991). An example based on a real data set is given.

  • PDF

Influential Points in GLMs via Backwards Stepping

  • Jeong, Kwang-Mo;Oh, Hae-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.1
    • /
    • pp.197-212
    • /
    • 2002
  • When assessing goodness-of-fit of a model, a small subset of deviating observations can give rise to a significant lack of fit. It is therefore important to identify such observations and to assess their effects on various aspects of analysis. A Cook's distance measure is usually used to detect influential observation. But it sometimes is not fully effective in identifying truly influential set of observations because there may exist masking or swamping effects. In this paper we confine our attention to influential subset In GLMs such as logistic regression models and loglinear models. We modify a backwards stepping algorithm, which was originally suggested for detecting outlying cells in contingency tables, to detect influential observations in GLMs. The algorithm consists of two steps, the identification step and the testing step. In identification step we Identify influential observations based on influencial measures such as Cook's distances. On the other hand in testing step we test the subset of identified observations to be significant or not Finally we explain the proposed method through two types of dataset related to logistic regression model and loglinear model, respectively.

A Bayesian Diagnostic Measure and Stopping Rule for Detecting Influential Observations in Discriminant Analysis

  • Kim, Myung-Cheol;Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.29 no.3
    • /
    • pp.337-350
    • /
    • 2000
  • This paper suggests a new diagnostic measure and a stopping rule for detecting influential observations in multiple discriminant analysis (MDA). It is developed from a Bayesian point of view using a default Bayes factor obtained from the fractional Bayes factor methodology. The Bayes factor is taken as a discriminatory information in MDA. It is shown that the effect of an observation over the discriminatory information is fully explained by the diagnostic measure. Based on the measure, we suggest a stopping rule for detecting influential observations in a given training sample. As a tool for interpreting the measure a graphical method is sued. Performance of the method is used. Performance of the method is examined through two illustrative examples.

  • PDF

On an Information Theoretic Diagnostic Measure for Detecting Influential Observations in LDA

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.25 no.2
    • /
    • pp.289-301
    • /
    • 1996
  • This paper suggests a new diagnostic measure for detecting influential observations in two group linear discriminant analysis(LDA). It is developed from an information theoretic point of view using the minimum discrimination information(MDI) methodology. MDI estimator of symmetric divergence by Kullback(l967) is taken as a measure of the power of discrimination in LDA. It is shown that the effect of an observation over the power of discrimination is fully explained by the diagnostic measure. Asymptotic distribution of the proposed measure is derived as a function of independent chi-squared and standard normal variables. By means of the distributions, a couple of methods are suggested for detecting the influential observations in LDA. Performance of the suggested methods are examined through a simulation study.

  • PDF

A Bayesian Diagnostic for Influential Observations in LDA

  • Lim, Jae-Hak;Lee, Chong-Hyung;Cho, Byung-Yup
    • Journal of Korean Society for Quality Management
    • /
    • v.28 no.1
    • /
    • pp.119-131
    • /
    • 2000
  • This paper suggests a new diagnostic measure for detecting influential observations in linear discriminant analysis (LDA). It is developed from a Bayesian point of view using a default Bayes factor obtained from the imaginary training sample methodology. The Bayes factor is taken as a criterion for testing homogeneity of covariance matrices in LDA model. It is noted that the effect of an observation over the criterion is fully explained by the diagnostic measure. We suggest a graphical method that can be taken as a tool for interpreting the diagnostic measure and detecting influential observations. Performance of the measure is examined through an illustrative example.

  • PDF

Cook-Type Influence Measure in Constrained Regression Models

  • Kim, Myung-Geun
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.2
    • /
    • pp.229-234
    • /
    • 2008
  • A Cook-type distance is considered for investigating the influence of observations in constrained regression models. Its exact sampling distribution is derived, which is used for judging whether each observation is influential or not. A numerical example is provided for illustration.

Preliminary Analysis on the Effects of Tropospheric Delay Models on Geosynchronous and Inclined Geosynchronous Orbit Satellites

  • Lee, Jinah;Park, Chandeok;Joo, Jung-Min
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.10 no.4
    • /
    • pp.371-377
    • /
    • 2021
  • This research proposes the best combination of tropospheric delay models for Korean Positioning System (KPS). The overall results are based on real observation data of Japanese Quasi-Zenith satellite system (QZSS), whose constellation is similar to the proposed constellation of KPS. The tropospheric delay models are constructed as the combinations of three types of zenith path delay (ZPD) models and four types of mapping functions (MFs). Two sets of International GNSS Service (IGS) stations with the same receiver are considered. Comparison of observation residuals reveals that the ZPD models are more influential to the measurement model rather than MFs, and that the best tropospheric delay model is the combination of GPT3 with 5 degrees grid and Vienna Mapping Function 1 (VMF1). While the bias of observation residual depends on the receivers, it still remains to be further analyzed.

Influence Analysis of the Common Mean Problem

  • Kim, Myung Geun
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.3
    • /
    • pp.217-223
    • /
    • 2013
  • Two influence diagnostic methods for the common mean model are proposed. First, an investigation of the influence of observations according to minor perturbations of the common mean model is made by adapting the local influence method which is based on the likelihood displacement. It is well known that the maximum likelihood estimates are in general sensitive to influential observations. Case-deletions can be a candidate for detecting influential observations. However, the maximum likelihood estimators are iteratively computed and therefore case-deletions involve an enormous amount of computations. An approximation by Newton's method to the maximum likelihood estimator obtained after a single observation was deleted can reduce much of computational burden, which will be treated in this work. A numerical example is given for illustration and it shows that the proposed diagnostic methods can be useful tools.