Browse > Article
http://dx.doi.org/10.5351/CSAM.2017.24.2.143

Probabilistic penalized principal component analysis  

Park, Chongsun (Department of Statistics, Sungkyunkwan University)
Wang, Morgan C. (Department of Statistics, University of Central Florida)
Mo, Eun Bi (Department of Statistics, Sungkyunkwan University)
Publication Information
Communications for Statistical Applications and Methods / v.24, no.2, 2017 , pp. 143-154 More about this Journal
Abstract
A variable selection method based on probabilistic principal component analysis (PCA) using penalized likelihood method is proposed. The proposed method is a two-step variable reduction method. The first step is based on the probabilistic principal component idea to identify principle components. The penalty function is used to identify important variables in each component. We then build a model on the original data space instead of building on the rotated data space through latent variables (principal components) because the proposed method achieves the goal of dimension reduction through identifying important observed variables. Consequently, the proposed method is of more practical use. The proposed estimators perform as the oracle procedure and are root-n consistent with a proper choice of regularization parameters. The proposed method can be successfully applied to high-dimensional PCA problems with a relatively large portion of irrelevant variables included in the data set. It is straightforward to extend our likelihood method in handling problems with missing observations using EM algorithms. Further, it could be effectively applied in cases where some data vectors exhibit one or more missing values at random.
Keywords
probability model; variable selection; penalized likelihood; EM algorithm; non-convex penalty; oracle estimators;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Anderson TW and Rubin H (1956). Statistical inference in factor analysis. In Proceedings of the 3rd Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, 111-150.
2 Breiman L (1995). Better subset regression using the nonnegative garrote, Technometrics, 37, 373-384.   DOI
3 Cadima J and Jolliffe IT (1995). Loadings and correlations in the interpretation of principal compo-nents, Journal of Applied Statistics, 22, 203-214   DOI
4 Fan J (1997). Comments on 'wavelets in statistics: a review' by A. Antoniadis, Journal of the Italian Statistical Society, 6, 131-138.   DOI
5 Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360.   DOI
6 Fan J and Peng H (2004). Nonconcave penalized likelihood with a diverging number of parameters, The Annals of Statistics, 32, 928-961.   DOI
7 Fu WJ (1998). Penalized regressions: the bridge versus the LASSO, Journal of Computational and Graphical Statistics, 7, 397-416.
8 Green PJ (1990). On use of the EM for penalized likelihood estimation, Journal of the Royal Statistical Society Series B (Methodological), 52, 443-452.   DOI
9 Hausman RE (1982). Constrained multivariate analysis. In SH Zanckis and JS Rustagi (Eds), Optimisation in Statistics: With a View Towards Applications in Management Science and Operations Research (pp. 137-151), North-Holland, Amsterdam.
10 Antoniadis A (1997). Wavelets in statistics: a review, Journal of the Italian Statistical Society, 6, 97-144.   DOI
11 Jeffers JNR (1967). Two case studies in the application of principal component analysis, Applied Statistics, 16, 225-236.   DOI
12 Jolliffe IT (1972). Discarding variables in a principal component analysis. I: artificial data, Applied Statistics, 21, 160-173.   DOI
13 Jolliffe IT (1989). Rotation of ill-defined principal components, Applied Statistics, 38, 139-147.   DOI
14 Jolliffe IT (1995). Rotation of principal components: choice of normalization constraints, Journal of Applied Statistics, 22, 29-35.   DOI
15 Jolliffe IT (2002). Principal Component Analysis, Springer-Verlag, New York.
16 Jolliffe IT, Trendafilov NT, and Uddin M (2003). A modified principal component technique based on the LASSO, Journal of Computational and Graphical Statistics, 12, 531-547.   DOI
17 Lawley DN (1953). A modified method of estimation in factor analysis and some large sample results. In Uppsala Symposium on Psychological Factor Analysis, Number 3 in Nordisk Psykologi's Monograph Series (pp. 35-42), Almqvist and Wiksell, Uppsala.
18 Tibshirani R (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B (Methodological), 58, 267-288.   DOI
19 Tipping ME and Bishop CM (1999a). Mixtures of probabilistic principal component analyzers, Neural computation, 11, 443-482.   DOI
20 Tipping ME and Bishop CM (1999b). Probabilistic principal component analysis, Journal of the Royal Statistical Society Series B (Statistical Methodology), 61, 611-622.   DOI
21 Jolliffe IT (1973). Discarding variables in a principal component analysis. II: real data, Applied Statistics, 22, 21-31.   DOI
22 Vines SK (2000). Simple principal components, Journal of the Royal Statistical Society Series C (Applied Statistics), 49, 441-451.   DOI
23 Witten DM, Tibshirani R, and Hastie T (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis, Biostatistics, 10, 515-534.   DOI
24 Xie B, Pan W, and Shen X (2010). Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data, Bioinformatics, 26, 501-508.   DOI
25 Zou H, Hastie T, and Tibshirani R (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15, 265-286.   DOI