Browse > Article
http://dx.doi.org/10.7465/jkdi.2017.28.4.721

A study on principal component analysis using penalty method  

Park, Cheolyong (Major in Statistics, Keimyung University)
Publication Information
Journal of the Korean Data and Information Science Society / v.28, no.4, 2017 , pp. 721-731 More about this Journal
Abstract
In this study, principal component analysis methods using Lasso penalty are introduced. There are two popular methods that apply Lasso penalty to principal component analysis. The first method is to find an optimal vector of linear combination as the regression coefficient vector of regressing for each principal component on the original data matrix with Lasso penalty (elastic net penalty in general). The second method is to find an optimal vector of linear combination by minimizing the residual matrix obtained from approximating the original matrix by the singular value decomposition with Lasso penalty. In this study, we have reviewed two methods of principal components using Lasso penalty in detail, and shown that these methods have an advantage especially in applying to data sets that have more variables than cases. Also, these methods are compared in an application to a real data set using R program. More specifically, these methods are applied to the crime data in Ahamad (1967), which has more variables than cases.
Keywords
Elastic net; lasso; penalty; principal component analysis; regression model;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Ahamad, B. (1967). An analysis of crimes by the method of principal components. Applied Statistics, 16, 17-35.   DOI
2 Eckart, C. and Young, G. (1936). The approximation of one matrix by another of low rank. Psychometrika, 1, 211-218.   DOI
3 Friedman, J., Hastie, T. and Tibshirani, R. (2008). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.
4 Johnson, R. A. and Wichern, D. W. (1992). Applied multivariate statistical analysis, 3rd Ed., Prentice Hall, New Jersey.
5 Jolliffe, I., Trendafilov, N. and Uddin, M. (2003). A modified principal component technique based on the lasso. Journal of Computational and Graphical Statistics, 12, 531-547.   DOI
6 Kwon, S., Han S. and Lee, S. (2013). A small review and further studies on the LASSO. Journal of the Korean Data & Information Science Society, 24, 1077-1088.   DOI
7 Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 21, 279-289.
8 Park, C. (2013). Simple principal component analysis using Lasso. Journal of the Korean Data & Information Science Society, 24, 533-541.   DOI
9 Park, C. and Kye, M. J. (2013). Penalized logistic regression models for determining the discharge of dyspnea patients. Journal of the Korean Data & Information Science Society, 24, 125-133.   DOI
10 Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis, 99, 1015-1034.   DOI
11 Witten, D. A. and Tibshirani, R. (2011). Penalized classification using Fisher's linear discriminant. Journal of the Royal Statistical Society B, 73, 753-772.   DOI
12 Witten, D. A., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with application to sparse principal components and canonical correlation analysis. Biostatistics, 10, 515-534.   DOI
13 Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67, 301-320.   DOI
14 Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15, 265-286.   DOI