Browse > Article
http://dx.doi.org/10.29220/CSAM.2018.25.2.235

Effect of outliers on the variable selection by the regularized regression  

Jeong, Junho (Department of Statistics, Pusan National University)
Kim, Choongrak (Department of Statistics, Pusan National University)
Publication Information
Communications for Statistical Applications and Methods / v.25, no.2, 2018 , pp. 235-243 More about this Journal
Abstract
Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.
Keywords
high-dimension; influential observation; LASSO; outlier; regularization;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Zhao J, Leng C, Li L, and Wang H (2013). High-dimensional influence measure, The Annals of Statistics, 41, 2639-2667.   DOI
2 BaeW, Noh S, and Kim C (2017). Case influence diagnostics for the significance of the linear regres-sion model, Communications for Statistical Applications and Methods, 24, 155-162.
3 Box GEP and Cox DR (1964). An analysis of transformations. Journal of the Royal Statistical Society, Series B, 26, 211-252.
4 Cook RD (1977). Detection of influential observation in linear regression, Technometrics, 19, 15-18.
5 Hoerl AE and Kennard RW (1970). Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 55-67.   DOI
6 Jang DH and Anderson-Cook CM (2017). Influence plots for LASSO, Quality and Reliability in Engineering International, 33, 1317-1326.   DOI
7 Kim C, Lee J, Yang H, and Bae W (2015). Case influence diagnostics in the lasso regression. Journal of the Korean Statistical Society, 44, 271-279.   DOI
8 Kim J and Lee S (2017). A convenient approach for penalty parameter selection in robust lasso regression, Communications for Statistical Applications and Methods, 24, 651-662.   DOI
9 Lu T, Pan Y, Kao SY, Kohane I, and Chan J (2004). Gene regulation and DNA damage in the ageing human brain, Nature, 429, 883-891.   DOI
10 Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288.