• 제목/요약/키워드: Influential observations

검색결과 73건 처리시간 0.032초

Effect of outliers on the variable selection by the regularized regression

  • Jeong, Junho;Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • 제25권2호
    • /
    • pp.235-243
    • /
    • 2018
  • Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.

ESTIMATING NEAR REAL TIME PRECIPITABLE WATER FROM SHORT BASELINE GPS OBSERVATIONS

  • Yang, Den-Ring;Liou, Yuei-An;Tseng, Pei-Li
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2007년도 Proceedings of ISRS 2007
    • /
    • pp.410-413
    • /
    • 2007
  • Water vapor in the atmosphere is an influential factor of the hydrosphere cycle, which exchanges heat through phase change and is essential to precipitation. Because of its significance in altering weather, the estimation of water vapor amount and distribution is crucial to determine the precision of the weather forecasting and the understanding of regional/local climate. It is shown that it is reliable to measure precipitable water (PW) using long baseline (500-2000km) GPS observations. However, it becomes infeasible to derive absolute PW from GPS observations in Taiwan due to geometric limitation of relatively short-baseline network. In this study, a method of deriving Near-Real-Time PW from short baseline GPS observations is proposed. This method uses a reference station to derive a regression model for wet delay, and to interpolate the difference of wet delay among stations. Then, the precipitable water is obtained by using a conversion factor derived from radiosondes. The method has been tested by using the reference station located on Mt. Ho-Hwan with eleven stations around Taiwan. The result indicates that short baseline GPS observations can be used to precisely estimate the precipitable water in near-real-time.

  • PDF

Graphical Methods for the Sensitivity Analysis in Discriminant Analysis

  • Jang, Dae-Heung;Anderson-Cook, Christine M.;Kim, Youngil
    • Communications for Statistical Applications and Methods
    • /
    • 제22권5호
    • /
    • pp.475-485
    • /
    • 2015
  • Similar to regression, many measures to detect influential data points in discriminant analysis have been developed. Many follow similar principles as the diagnostic measures used in linear regression in the context of discriminant analysis. Here we focus on the impact on the predicted classification posterior probability when a data point is omitted. The new method is intuitive and easily interpretable compared to existing methods. We also propose a graphical display to show the individual movement of the posterior probability of other data points when a specific data point is omitted. This enables the summaries to capture the overall pattern of the change.

MULTIPLE DELETION MEASURES OF TEST STATISTICS IN MULTIVARIATE REGRESSION

  • Jung, Kang-Mo
    • Journal of applied mathematics & informatics
    • /
    • 제26권3_4호
    • /
    • pp.679-688
    • /
    • 2008
  • In multivariate regression analysis there exist many influence measures on the regression estimates. However it seems to be few of influence diagnostics on test statistics in hypothesis testing. Case-deletion approach is fundamental for investigating influence of observations on estimates or statistics. Tang and Fung (1997) derived single case-deletion of the Wilks' ratio, Lawley-Hotelling trace, Pillai's trace for testing a general linear hypothesis of the regression coefficients in multivariate regression. In this paper we derived more extended form of those measures to deal with joint influence among observations. A numerical example is given to illustrate the effect of joint influence on the test statistics.

  • PDF

Diagnostics for the Cox model

  • Xue, Yishu;Schifano, Elizabeth D.
    • Communications for Statistical Applications and Methods
    • /
    • 제24권6호
    • /
    • pp.583-604
    • /
    • 2017
  • The most popular regression model for the analysis of time-to-event data is the Cox proportional hazards model. While the model specifies a parametric relationship between the hazard function and the predictor variables, there is no specification regarding the form of the baseline hazard function. A critical assumption of the Cox model, however, is the proportional hazards assumption: when the predictor variables do not vary over time, the hazard ratio comparing any two observations is constant with respect to time. Therefore, to perform credible estimation and inference, one must first assess whether the proportional hazards assumption is reasonable. As with other regression techniques, it is also essential to examine whether appropriate functional forms of the predictor variables have been used, and whether there are any outlying or influential observations. This article reviews diagnostic methods for assessing goodness-of-fit for the Cox proportional hazards model. We illustrate these methods with a case-study using available R functions, and provide complete R code for a simulated example as a supplement.

CASB-DELETION DIAGNOSTICS FOR TESTING A LINEAR HYPOTHESIS ABOUT REGRESSION COEFFICIENTS

  • Kim, Myung-Geun
    • Journal of applied mathematics & informatics
    • /
    • 제10권1_2호
    • /
    • pp.111-118
    • /
    • 2002
  • We study the influence of observations on testing a linear hypothesis using single and multiple case-deletions. The change in the F-test statistic due to case-deletions is shown to be completely determined by two externally Studentized residuals. These residuals we used for investigating the outlyingness when there are linear constraints or not. An illustrative example is given. It shows the usefulness of case-deletions.

Influence Assessment in Robust Regression

  • Sohn, Bang-Yong;Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • 제4권1호
    • /
    • pp.21-32
    • /
    • 1997
  • Robust regression based on M-estimator reduces and/or bounds the influence of outliers in the y-direction only. Therefore, when several influential observations exist, diagnostics in the robust regression is required in order to detect them. In this paper, we propose influence diagnostics in the robust regression based on M-estimator and its one-step version. Noting that M-estimator can be obtained through iterative weighted least squares regression by using internal weights, we apply the weighted least squares (WLS) regression diagnostics to robust regression.

  • PDF

Multiple Deletions in Logistic Regression Models

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • 제16권2호
    • /
    • pp.309-315
    • /
    • 2009
  • We extended the results of Roy and Guria (2008) to multiple deletions in logistic regression models. Since single deletions may not exactly detect outliers or influential observations due to swamping effects and masking effects, it needs multiple deletions. We developed conditional deletion diagnostics which are designed to overcome problems of masking effects. We derived the closed forms for several statistics in logistic regression models. They give useful diagnostics on the statistics.

능형 회귀에서의 민감도 분석에 관한 연구 (A Study on Sensitivity Analysis in Ridge Regression)

  • Kim, Soon-Kwi
    • 품질경영학회지
    • /
    • 제19권1호
    • /
    • pp.1-15
    • /
    • 1991
  • In this paper, we discuss and review various measures which have been presented for studying outliers, high-leverage points, and influential observations when ridge regression estimation is adopted. We derive the influence function for ${\underline{\hat{\beta}}}\small{R}$, the ridge regression estimator, and discuss its various finite sample approximations when ridge regression is postulated. We also study several diagnostic measures such as Welsh-Kuh's distance, Cook's distance etc.

  • PDF

Test for an Outlier in Multivariate Regression with Linear Constraints

  • Kim, Myung-Geun
    • Communications for Statistical Applications and Methods
    • /
    • 제9권2호
    • /
    • pp.473-478
    • /
    • 2002
  • A test for a single outlier in multivariate regression with linear constraints on regression coefficients using a mean shift model is derived. It is shown that influential observations based on case-deletions in testing linear hypotheses are determined by two types of outliers that are mean shift outliers with or without linear constraints, An illustrative example is given.