Browse > Article
http://dx.doi.org/10.5351/KJAS.2018.31.4.463

Fast robust variable selection using VIF regression in large datasets  

Seo, Han Son (Department of Applied Statistics, Konkuk University)
Publication Information
The Korean Journal of Applied Statistics / v.31, no.4, 2018 , pp. 463-473 More about this Journal
Abstract
Variable selection algorithms for linear regression models of large data are considered. Many algorithms are proposed focusing on the speed and the robustness of algorithms. Among them variance inflation factor (VIF) regression is fast and accurate due to the use of a streamwise regression approach. But a VIF regression is susceptible to outliers because it estimates a model by a least-square method. A robust criterion using a weighted estimator has been proposed for the robustness of algorithm; in addition, a robust VIF regression has also been proposed for the same purpose. In this article a fast and robust variable selection method is suggested via a VIF regression with detecting and removing potential outliers. A simulation study and an analysis of a dataset are conducted to compare the suggested method with other methods.
Keywords
large dataset; linear regression; stagewise regression; variable selection;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Dupuis, D. J. and Victoria-Feser, M. P. (2013). Robust VIF regression with application to variable selection in large data sets, Annals of Applied Statistics, 7, 319-341.   DOI
2 Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space, Journal of the Royal Statistical Society. Series B, 70, 849-911.   DOI
3 Foster, D. P. and Stine, R. A. (2008). investing: a procedure for sequential control of expected false discoveries, Journal of the Royal Statistical Society. Series B, 70, 429-444.   DOI
4 Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88, 1264-1272.   DOI
5 Harrison, D. and Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air, Journal of Environmental Economics and Management, 5, 81-102.   DOI
6 Lin, D., Foster, D. P., and Ungar, L. H. (2011). VIF regression: a fast regression algorithm for large data, Journal of the American Statistical Association, 106, 232-247.   DOI
7 Stock, J. H. and Watson, M. W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
8 Zhou, J., Foster, D. P., and Ungar, L. H. (2006). Streamwise feature selection, Journal of Machine Learning Research, 7, 1861-1885.
9 Dupuis, D. J. and Victoria-Feser, M. P. (2011). Fast robust model selection in large Datasets, Journal of the American Statistical Association, 106, 203-212.   DOI