DOI QR코드

DOI QR Code

페널티 방법을 이용한 주성분분석 연구

A study on principal component analysis using penalty method

  • 투고 : 2017.06.05
  • 심사 : 2017.07.11
  • 발행 : 2017.07.31

초록

이 연구에서는 Lasso 페널티 방법을 이용한 주성분분석 방법을 소개한다. 주성분분석에 Lasso 페널티를 적용하는 방법으로 흔히 사용되는 방법은 크게 두 가지가 있다. 첫 번째 방법은 주성분을 반응변수로 놓고 원 자료행렬을 설명변수로 하는 회귀분석의 회귀계수를 이용하여 최적의 선형결 합 벡터를 구할 때 Lasso 페널티 (일반적으로 elastic net 페널티)를 부과하는 방법이다. 두 번째 방법은 원자료행렬을 비정칙값 분해로 근사하고 남은 잔차행렬에 Lasso 페널티를 부과하여 최적의 선형결합 벡터를 구하는 방법이다. 이 연구에서는 주성분 분석에 Lasso 페널티를 부과하는 이 두 가지 방법들을 자세하게 개관하는데, 이 방법들은 변수 숫자가 표본크기보다 큰 경우에도 적용가능한 장점이 있다. 또한 실제 자료분석에서 R 프로그램을 통해 두 방법을 적용하고 그 결과를 비교한다. 구체적으로 변수 숫자가 표본크기보다 큰 Ahamad (1967)의 crime 자료에 적용한다.

In this study, principal component analysis methods using Lasso penalty are introduced. There are two popular methods that apply Lasso penalty to principal component analysis. The first method is to find an optimal vector of linear combination as the regression coefficient vector of regressing for each principal component on the original data matrix with Lasso penalty (elastic net penalty in general). The second method is to find an optimal vector of linear combination by minimizing the residual matrix obtained from approximating the original matrix by the singular value decomposition with Lasso penalty. In this study, we have reviewed two methods of principal components using Lasso penalty in detail, and shown that these methods have an advantage especially in applying to data sets that have more variables than cases. Also, these methods are compared in an application to a real data set using R program. More specifically, these methods are applied to the crime data in Ahamad (1967), which has more variables than cases.

키워드

참고문헌

  1. Ahamad, B. (1967). An analysis of crimes by the method of principal components. Applied Statistics, 16, 17-35. https://doi.org/10.2307/2985232
  2. Eckart, C. and Young, G. (1936). The approximation of one matrix by another of low rank. Psychometrika, 1, 211-218. https://doi.org/10.1007/BF02288367
  3. Friedman, J., Hastie, T. and Tibshirani, R. (2008). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.
  4. Johnson, R. A. and Wichern, D. W. (1992). Applied multivariate statistical analysis, 3rd Ed., Prentice Hall, New Jersey.
  5. Jolliffe, I., Trendafilov, N. and Uddin, M. (2003). A modified principal component technique based on the lasso. Journal of Computational and Graphical Statistics, 12, 531-547. https://doi.org/10.1198/1061860032148
  6. Kwon, S., Han S. and Lee, S. (2013). A small review and further studies on the LASSO. Journal of the Korean Data & Information Science Society, 24, 1077-1088. https://doi.org/10.7465/jkdi.2013.24.5.1077
  7. Park, C. (2013). Simple principal component analysis using Lasso. Journal of the Korean Data & Information Science Society, 24, 533-541. https://doi.org/10.7465/jkdi.2013.24.3.533
  8. Park, C. and Kye, M. J. (2013). Penalized logistic regression models for determining the discharge of dyspnea patients. Journal of the Korean Data & Information Science Society, 24, 125-133. https://doi.org/10.7465/jkdi.2013.24.1.125
  9. Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis, 99, 1015-1034. https://doi.org/10.1016/j.jmva.2007.06.007
  10. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 21, 279-289.
  11. Witten, D. A. and Tibshirani, R. (2011). Penalized classification using Fisher's linear discriminant. Journal of the Royal Statistical Society B, 73, 753-772. https://doi.org/10.1111/j.1467-9868.2011.00783.x
  12. Witten, D. A., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with application to sparse principal components and canonical correlation analysis. Biostatistics, 10, 515-534. https://doi.org/10.1093/biostatistics/kxp008
  13. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67, 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
  14. Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15, 265-286. https://doi.org/10.1198/106186006X113430