[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5351/CSAM.2017.24.2.143

Probabilistic penalized principal component analysis

Park, Chongsun (Department of Statistics, Sungkyunkwan University)
Wang, Morgan C. (Department of Statistics, University of Central Florida)
Mo, Eun Bi (Department of Statistics, Sungkyunkwan University)

Publication Information

Communications for Statistical Applications and Methods / v.24, no.2, 2017 , pp. 143-154 More about this Journal

Abstract

A variable selection method based on probabilistic principal component analysis (PCA) using penalized likelihood method is proposed. The proposed method is a two-step variable reduction method. The first step is based on the probabilistic principal component idea to identify principle components. The penalty function is used to identify important variables in each component. We then build a model on the original data space instead of building on the rotated data space through latent variables (principal components) because the proposed method achieves the goal of dimension reduction through identifying important observed variables. Consequently, the proposed method is of more practical use. The proposed estimators perform as the oracle procedure and are root-n consistent with a proper choice of regularization parameters. The proposed method can be successfully applied to high-dimensional PCA problems with a relatively large portion of irrelevant variables included in the data set. It is straightforward to extend our likelihood method in handling problems with missing observations using EM algorithms. Further, it could be effectively applied in cases where some data vectors exhibit one or more missing values at random.

Keywords

probability model; variable selection; penalized likelihood; EM algorithm; non-convex penalty; oracle estimators;

Citations & Related Records

Reference

1	Anderson TW and Rubin H (1956). Statistical inference in factor analysis. In Proceedings of the 3rd Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, 111-150.
2	Breiman L (1995). Better subset regression using the nonnegative garrote, Technometrics, 37, 373-384. DOI
3	Cadima J and Jolliffe IT (1995). Loadings and correlations in the interpretation of principal compo-nents, Journal of Applied Statistics, 22, 203-214 DOI
4	Fan J (1997). Comments on 'wavelets in statistics: a review' by A. Antoniadis, Journal of the Italian Statistical Society, 6, 131-138. DOI
5	Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360. DOI
6	Fan J and Peng H (2004). Nonconcave penalized likelihood with a diverging number of parameters, The Annals of Statistics, 32, 928-961. DOI
7	Fu WJ (1998). Penalized regressions: the bridge versus the LASSO, Journal of Computational and Graphical Statistics, 7, 397-416.
8	Green PJ (1990). On use of the EM for penalized likelihood estimation, Journal of the Royal Statistical Society Series B (Methodological), 52, 443-452. DOI
9	Hausman RE (1982). Constrained multivariate analysis. In SH Zanckis and JS Rustagi (Eds), Optimisation in Statistics: With a View Towards Applications in Management Science and Operations Research (pp. 137-151), North-Holland, Amsterdam.
10	Antoniadis A (1997). Wavelets in statistics: a review, Journal of the Italian Statistical Society, 6, 97-144. DOI
11	Jeffers JNR (1967). Two case studies in the application of principal component analysis, Applied Statistics, 16, 225-236. DOI
12	Jolliffe IT (1972). Discarding variables in a principal component analysis. I: artificial data, Applied Statistics, 21, 160-173. DOI
13	Jolliffe IT (1989). Rotation of ill-defined principal components, Applied Statistics, 38, 139-147. DOI
14	Jolliffe IT (1995). Rotation of principal components: choice of normalization constraints, Journal of Applied Statistics, 22, 29-35. DOI
15	Jolliffe IT (2002). Principal Component Analysis, Springer-Verlag, New York.
16	Jolliffe IT, Trendafilov NT, and Uddin M (2003). A modified principal component technique based on the LASSO, Journal of Computational and Graphical Statistics, 12, 531-547. DOI
17	Lawley DN (1953). A modified method of estimation in factor analysis and some large sample results. In Uppsala Symposium on Psychological Factor Analysis, Number 3 in Nordisk Psykologi's Monograph Series (pp. 35-42), Almqvist and Wiksell, Uppsala.
18	Tibshirani R (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B (Methodological), 58, 267-288. DOI
19	Tipping ME and Bishop CM (1999a). Mixtures of probabilistic principal component analyzers, Neural computation, 11, 443-482. DOI
20	Tipping ME and Bishop CM (1999b). Probabilistic principal component analysis, Journal of the Royal Statistical Society Series B (Statistical Methodology), 61, 611-622. DOI
21	Jolliffe IT (1973). Discarding variables in a principal component analysis. II: real data, Applied Statistics, 22, 21-31. DOI
22	Vines SK (2000). Simple principal components, Journal of the Royal Statistical Society Series C (Applied Statistics), 49, 441-451. DOI
23	Witten DM, Tibshirani R, and Hastie T (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis, Biostatistics, 10, 515-534. DOI
24	Xie B, Pan W, and Shen X (2010). Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data, Bioinformatics, 26, 501-508. DOI
25	Zou H, Hastie T, and Tibshirani R (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15, 265-286. DOI