Browse > Article
http://dx.doi.org/10.7465/jkdi.2017.28.6.1291

Graphical method for evaluating the impact of influential observations in high-dimensional data  

Ahn, Sojin (Department of Statistics, Pukyong National University)
Lee, Jae Eun (Department of Statistics, Pukyong National University)
Jang, Dae-Heung (Department of Statistics, Pukyong National University)
Publication Information
Journal of the Korean Data and Information Science Society / v.28, no.6, 2017 , pp. 1291-1300 More about this Journal
Abstract
In the high-dimensional data, the number of variables is very larger than the number of observations. In this case, the impact of influential observations on regression coefficient estimates can be very large. Jang and Anderson-Cook (2017) suggested the LASSO influence plot. In this paper, we propose the LASSO influence plot, LASSO variable selection ranking plot, and three-dimensional LASSO influence plot as graphical methods for evaluating the impact of influential observations in high-dimensional data. With real two high-dimensional data examples, we apply these graphical methods as the regression diagnostics tools for finding influential observations. It has been found that we can obtain influential observations with by these graphical methods.
Keywords
Influential observations; LASSO influence plot; LASSO variable selection ranking plot; three-dimensional LASSO influence plot;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S. Mack, D. and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probed by oligonucleotide arrays. Proceedings of National Academy of Science USA, 96, 6745-6750.   DOI
2 Fan, J., Feng, Y., Saldana, D. F., Samworth, R. and Wu, Y. (2017). http://www.stat.columbia.edu/-yangfeng/pubs/jss1375.pdf, Package 'SIS'.
3 Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space. Journal of the Royal Statistical Society Series B, 70, 849-911.   DOI
4 Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531-537.   DOI
5 Hwang, E. J. and Na, J. H. (2015). Influenza prediction models by using meteorological and social media informations. Journal of theKorean Data & Information Science Society, 26, 1087-1095.   DOI
6 Jang, D. H. and Anderson-Cook, C. M (2017). Influence plots for LASSO. Quality and Reliability Engineering International, 33, 1317-1326.   DOI
7 Shin, J. E., Oh, Y. S. and Lim, D. H. (2016). RHadoop platform for K-Means clustering of big data. Journal of theKorean Data & Information Science Society, 27, 609-619.   DOI
8 Jung, B. H. and Lim, D. H. (2016). Learning algorithms for big data logistic regression on RHIPE platform. Journal of theKorean Data & Information Science Society, 27, 911-923.   DOI
9 Lee, S., Cho, J., Kang, C. and Choi, S. (2015). Study on prediction for a film success using text mining. Journal of theKorean Data & Information Science Society, 26, 1259-1269.   DOI
10 Lee, W. and Chun, H. (2016). A deep learning analysis of the Chinese Yuan’s volatility in the onshore and offshore markets. Journal of theKorean Data & Information Science Society, 27, 327-335.   DOI
11 Zeng, V. and Breheny, P. (2017). https://arxiv.org/abs/1701.05936, Package 'biglasso'.