A study on principal component analysis using penalty method

Park, Cheolyong;

doi:10.7465/jkdi.2017.28.4.721

Journal of the Korean Data and Information Science Society

Volume 28 Issue 4
/
Pages.721-731
/
2017
/
1598-9402(pISSN)

The Korean Data and Information Science Society (한국데이터정보과학회)

DOI QR Code

A study on principal component analysis using penalty method

페널티 방법을 이용한 주성분분석 연구

Park, Cheolyong (Major in Statistics, Keimyung University)

박철용 (계명대학교 통계학전공)

Received : 2017.06.05
Accepted : 2017.07.11
Published : 2017.07.31

https://doi.org/10.7465/jkdi.2017.28.4.721 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this study, principal component analysis methods using Lasso penalty are introduced. There are two popular methods that apply Lasso penalty to principal component analysis. The first method is to find an optimal vector of linear combination as the regression coefficient vector of regressing for each principal component on the original data matrix with Lasso penalty (elastic net penalty in general). The second method is to find an optimal vector of linear combination by minimizing the residual matrix obtained from approximating the original matrix by the singular value decomposition with Lasso penalty. In this study, we have reviewed two methods of principal components using Lasso penalty in detail, and shown that these methods have an advantage especially in applying to data sets that have more variables than cases. Also, these methods are compared in an application to a real data set using R program. More specifically, these methods are applied to the crime data in Ahamad (1967), which has more variables than cases.

이 연구에서는 Lasso 페널티 방법을 이용한 주성분분석 방법을 소개한다. 주성분분석에 Lasso 페널티를 적용하는 방법으로 흔히 사용되는 방법은 크게 두 가지가 있다. 첫 번째 방법은 주성분을 반응변수로 놓고 원 자료행렬을 설명변수로 하는 회귀분석의 회귀계수를 이용하여 최적의 선형결 합 벡터를 구할 때 Lasso 페널티 (일반적으로 elastic net 페널티)를 부과하는 방법이다. 두 번째 방법은 원자료행렬을 비정칙값 분해로 근사하고 남은 잔차행렬에 Lasso 페널티를 부과하여 최적의 선형결합 벡터를 구하는 방법이다. 이 연구에서는 주성분 분석에 Lasso 페널티를 부과하는 이 두 가지 방법들을 자세하게 개관하는데, 이 방법들은 변수 숫자가 표본크기보다 큰 경우에도 적용가능한 장점이 있다. 또한 실제 자료분석에서 R 프로그램을 통해 두 방법을 적용하고 그 결과를 비교한다. 구체적으로 변수 숫자가 표본크기보다 큰 Ahamad (1967)의 crime 자료에 적용한다.

Keywords

References

Ahamad, B. (1967). An analysis of crimes by the method of principal components. Applied Statistics, 16, 17-35. https://doi.org/10.2307/2985232
Eckart, C. and Young, G. (1936). The approximation of one matrix by another of low rank. Psychometrika, 1, 211-218. https://doi.org/10.1007/BF02288367
Friedman, J., Hastie, T. and Tibshirani, R. (2008). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.
Johnson, R. A. and Wichern, D. W. (1992). Applied multivariate statistical analysis, 3rd Ed., Prentice Hall, New Jersey.
Jolliffe, I., Trendafilov, N. and Uddin, M. (2003). A modified principal component technique based on the lasso. Journal of Computational and Graphical Statistics, 12, 531-547. https://doi.org/10.1198/1061860032148
Kwon, S., Han S. and Lee, S. (2013). A small review and further studies on the LASSO. Journal of the Korean Data & Information Science Society, 24, 1077-1088. https://doi.org/10.7465/jkdi.2013.24.5.1077
Park, C. (2013). Simple principal component analysis using Lasso. Journal of the Korean Data & Information Science Society, 24, 533-541. https://doi.org/10.7465/jkdi.2013.24.3.533
Park, C. and Kye, M. J. (2013). Penalized logistic regression models for determining the discharge of dyspnea patients. Journal of the Korean Data & Information Science Society, 24, 125-133. https://doi.org/10.7465/jkdi.2013.24.1.125
Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis, 99, 1015-1034. https://doi.org/10.1016/j.jmva.2007.06.007
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 21, 279-289.
Witten, D. A. and Tibshirani, R. (2011). Penalized classification using Fisher's linear discriminant. Journal of the Royal Statistical Society B, 73, 753-772. https://doi.org/10.1111/j.1467-9868.2011.00783.x
Witten, D. A., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with application to sparse principal components and canonical correlation analysis. Biostatistics, 10, 515-534. https://doi.org/10.1093/biostatistics/kxp008
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67, 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15, 265-286. https://doi.org/10.1198/106186006X113430

Journal of the Korean Data and Information Science Society

A study on principal component analysis using penalty method

페널티 방법을 이용한 주성분분석 연구

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)