Search | Korea Science

A Note on Cook's Distance in the Multivariate Linear Model

Bae, Whasoo;Hwang, Hyunmi;Kim, Choongrak
- Communications for Statistical Applications and Methods
- /
- v.20 no.1
- /
- pp.23-28
- /
- 2013
We propose a version of Cook's distance (called local distance) in the multivariate linear model. The proposed version is a matrix, while the existing version of Cook's distance (called global distance) is a scalar. The existing Cook's distance is the trace of the proposed Cook's distance. In addition, we argue that the proposed Cook's distance has a more natural extension of the Cook's distance in the univariate linear model than the existing Cook's distance. An illustrative example based on a real data set is given.
https://doi.org/10.5351/CSAM.2013.20.1.023 인용 PDF KSCI

Cutoff Values for Cook's Distance

Choongrak Kim
- Communications for Statistical Applications and Methods
- /
- v.3 no.2
- /
- pp.13-19
- /
- 1996
Cook's distance(Cook, 1997) is one of the most widely used influence measures to assess the influence of single observations or sets of observations in the linear regression model. After computing Cook(1977) suggested guidelines based on a confidence ellipsoid for the regression parameter ${\beta}$. In this paper, we suggest cutoff values for Cook's distance cia Monte Carlo simulation, and compare them with Cook's guidelines. An example based on a real data set is given.
PDF

A cautionary note on the use of Cook's distance

Kim, Myung Geun
- Communications for Statistical Applications and Methods
- /
- v.24 no.3
- /
- pp.317-324
- /
- 2017
An influence measure known as Cook's distance has been used for judging the influence of each observation on the least squares estimate of the parameter vector. The distance does not reflect the distributional property of the change in the least squares estimator of the regression coefficients due to case deletions: the distribution has a covariance matrix of rank one and thus it has a support set determined by a line in the multidimensional Euclidean space. As a result, the use of Cook's distance may fail to correctly provide information about influential observations, and we study some reasons for the failure. Three illustrative examples will be provided, in which the use of Cook's distance fails to give the right information about influential observations or it provides the right information about the most influential observation. We will seek some reasons for the wrong or right provision of information.
https://doi.org/10.5351/CSAM.2017.24.3.317 인용 PDF KSCI

A Comparison of Influence Diagnostics in Linear Mixed Models

Lee, Jang-Taek
- Communications for Statistical Applications and Methods
- /
- v.10 no.1
- /
- pp.125-134
- /
- 2003
Standard estimation methods for linear mixed models are sensitive to influential observations. However, tools and concepts for linear mixed model diagnostics are rudimentary until now and research is heavily demanded in linear mixed models. In this paper, we consider two diagnostics to evaluate the effects of individual observations in the estimation of fixed effects for linear mixed models. Those are Cook's distance and COVRATIO. Results of our limited simulation study suggest that the Cook's distance is not good statistical quantity in linear mixed models. Also calibration point for COVRATIO seems to be quite conservative.
https://doi.org/10.5351/CKSS.2003.10.1.125 인용 PDF KSCI

Influence Analysis of Constrained Regression Models

Kim, Myung-Geun
- Communications for Statistical Applications and Methods
- /
- v.14 no.2
- /
- pp.281-286
- /
- 2007
Cook's distance is generalized to the multiple linear regression with linear constraints on regression coefficients. It is used for identifying influential observations in constrained regression models. A numerical example is provided for illustration.
https://doi.org/10.5351/CKSS.2007.14.2.281 인용 PDF KSCI

The local influence of LIU type estimator in linear mixed model

Zhang, Lili;Baek, Jangsun
- Journal of the Korean Data and Information Science Society
- /
- v.26 no.2
- /
- pp.465-474
- /
- 2015
In this paper, we study the local influence analysis of LIU type estimator in the linear mixed models. Using the method proposed by Shi (1997), the local influence of LIU type estimator in three disturbance models are investigated respectively. Furthermore, we give the generalized Cook's distance to assess the influence, and illustrate the efficiency of the proposed method by example.
https://doi.org/10.7465/jkdi.2015.26.2.465 인용 PDF KSCI

A Study on Sensitivity Analysis in Ridge Regression (능형 회귀에서의 민감도 분석에 관한 연구)

Kim, Soon-Kwi
- Journal of Korean Society for Quality Management
- /
- v.19 no.1
- /
- pp.1-15
- /
- 1991
In this paper, we discuss and review various measures which have been presented for studying outliers, high-leverage points, and influential observations when ridge regression estimation is adopted. We derive the influence function for ${\underline{\hat{\beta}}}\small{R}$, the ridge regression estimator, and discuss its various finite sample approximations when ridge regression is postulated. We also study several diagnostic measures such as Welsh-Kuh's distance, Cook's distance etc.
PDF

Influential Points in GLMs via Backwards Stepping

Jeong, Kwang-Mo;Oh, Hae-Young
- Communications for Statistical Applications and Methods
- /
- v.9 no.1
- /
- pp.197-212
- /
- 2002
When assessing goodness-of-fit of a model, a small subset of deviating observations can give rise to a significant lack of fit. It is therefore important to identify such observations and to assess their effects on various aspects of analysis. A Cook's distance measure is usually used to detect influential observation. But it sometimes is not fully effective in identifying truly influential set of observations because there may exist masking or swamping effects. In this paper we confine our attention to influential subset In GLMs such as logistic regression models and loglinear models. We modify a backwards stepping algorithm, which was originally suggested for detecting outlying cells in contingency tables, to detect influential observations in GLMs. The algorithm consists of two steps, the identification step and the testing step. In identification step we Identify influential observations based on influencial measures such as Cook's distances. On the other hand in testing step we test the subset of identified observations to be significant or not Finally we explain the proposed method through two types of dataset related to logistic regression model and loglinear model, respectively.
https://doi.org/10.5351/CKSS.2002.9.1.197 인용 PDF KSCI

Outlier Detection and Treatment for the Conversion of Chemical Oxygen Demand to Total Organic Carbon (화학적산소요구량의 총유기탄소 변환을 위한 이상자료의 탐지와 처리)

Cho, Beom Jun;Cho, Hong Yeon;Kim, Sung
- Journal of Korean Society of Coastal and Ocean Engineers
- /
- v.26 no.4
- /
- pp.207-216
- /
- 2014
Total organic carbon (TOC) is an important indicator used as an direct biological index in the research field of the marine carbon cycle. It is possible to produce the sufficient TOC estimation data by using the Chemical Oxygen Demand(COD) data because the available TOC data is relatively poor than the COD data. The outlier detection and treatment (removal) should be carried out reasonably and objectively because the equation for a COD-TOC conversion is directly affected the TOC estimation. In this study, it aims to suggest the optimal regression model using the available salinity, COD, and TOC data observed in the Korean coastal zone. The optimal regression model is selected by the comparison and analysis on the changes of data numbers before and after removal, variation coefficients and root mean square (RMS) error of the diverse detection methods of the outlier and influential observations. According to research result, it is shown that a diagnostic case combining SIQR (Semi - Inter-Quartile Range) boxplot and Cook's distance method is most suitable for the outlier detection. The optimal regression function is estimated as the TOC(mg/L) = $0.44{\cdot}COD(mg/L)+1.53$, then determination coefficient is showed a value of 0.47 and RMS error is 0.85 mg/L. The RMS error and the variation coefficients of the leverage values are greatly reduced to the 31% and 80% of the value before the outlier removal condition. The method suggested in this study can provide more appropriate regression curve because the excessive impacts of the outlier frequently included in the COD and TOC monitoring data is removed.
https://doi.org/10.9765/KSCOE.2014.26.4.207 인용 PDF KSCI

Local Influence of the Quasi-likelihood Estimators in Generalized Linear Models

Jung, Kang-Mo
- Communications for Statistical Applications and Methods
- /
- v.14 no.1
- /
- pp.229-239
- /
- 2007
We present a diagnostic method for the quasi-likelihood estimators in generalized linear models. Since these estimators can be usually obtained by iteratively reweighted least squares which are well known to be very sensitive to unusual data, a diagnostic step is indispensable to analysis of data. We extend the local influence approach based on the maximum likelihood function to that on the quasi-likelihood function. Under several perturbation schemes local influence diagnostics are derived. An illustrative example is given and we compare the results provided by local influence and deletion.
https://doi.org/10.5351/CKSS.2007.14.1.229 인용 PDF KSCI

Search Result 21, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)