[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4134/BKMS.b170770

A STOCHASTIC VARIANCE REDUCTION METHOD FOR PCA BY AN EXACT PENALTY APPROACH

Jung, Yoon Mo (Department of Mathematics, Sungkyunkwan University)
Lee, Jae Hwa (Applied Algebra and Optimization Research Center, Sungkyunkwan University)
Yun, Sangwoon (Department of Mathematics Education, Sungkyunkwan University)

Publication Information

Bulletin of the Korean Mathematical Society / v.55, no.4, 2018 , pp. 1303-1315 More about this Journal

Abstract

For principal component analysis (PCA) to efficiently analyze large scale matrices, it is crucial to find a few singular vectors in cheaper computational cost and under lower memory requirement. To compute those in a fast and robust way, we propose a new stochastic method. Especially, we adopt the stochastic variance reduced gradient (SVRG) method [11] to avoid asymptotically slow convergence in stochastic gradient descent methods. For that purpose, we reformulate the PCA problem as a unconstrained optimization problem using a quadratic penalty. In general, increasing the penalty parameter to infinity is needed for the equivalence of the two problems. However, in this case, exact penalization is guaranteed by applying the analysis in [24]. We establish the convergence rate of the proposed method to a stationary point and numerical experiments illustrate the validity and efficiency of the proposed method.

Keywords

principal component analysis; stochastic variance reduction; exact penalty;

Citations & Related Records

Reference

1	Z. Allen-Zhu and E. Hazan, Variance Reduction for Faster Non-Convex Optimization, Preprint arXiv:1603.05643, 2016.
2	Z. Allen-Zhu and Y. Li, LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain, Preprint arXiv:1607.03463v2, 2017.
3	Z. Allen-Zhu and Y. Yuan, Improved SVRG for Non-Strongly-Convex or Sum-of-NonConvex Objectives, Preprint arXiv:1506.01972v3, 2016.
4	J. Barzilai and J. M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal. 8 (1988), no. 1, 141-148. DOI
5	L. Bottou, F. E. Curtis, and J. Nocedal, Optimization Methods for Large-Scale Machine Learning, Preprint arXiv:1606.04838v1, 2016.
6	J. P. Cunningham and Z. Ghahramani, Linear dimensionality reduction: survey, insights, and generalizations, J. Mach. Learn. Res. 16 (2015), 2859-2900.
7	D. Garber and E. Hazan, Fast and Simple PCA via Convex Optimization, Preprint arXiv:1509.05647v4, 2015.
8	G. H. Golub and C. F. Van Loan, Matrix Computations, fourth edition, Johns Hopkins Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, 2013.
9	R. A. Horn and C. R. Johnson, Matrix Analysis, second edition, Cambridge University Press, Cambridge, 2013.
10	B. Jiang, C. Cui, and Y.-H. Dai, Unconstrained optimization models for computing several extreme eigenpairs of real symmetric matrices, Pac. J. Optim. 10 (2014), no. 1, 53-71.
11	S. J. Reddi, A. Hefny, S. Sra, and B. Poczos, Stochastic Variance Reduction for Nonconvex Optimization, Preprint arXiv:1603.06160v2, 2016.
12	R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems (2013), 315-323.
13	I. T. Jolliffe, Principal Component Analysis, second edition, Springer Series in Statistics, Springer-Verlag, New York, 2002.
14	H. Kasai, H. Sato, and B. Mishra, Riemannian stochastic variance reduced gradient on Grassmann manifold, Preprint arXiv:1605.07367v3, 2017.
15	Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86 (1998), no. 11, 2278-2324. DOI
16	X. Liu, Z. Wen, and Y. Zhang, Limited memory block Krylov subspace optimization for computing dominant singular value decompositions, SIAM J. Sci. Comput. 35 (2013), no. 3, A1641-A1668. DOI
17	Y. Saad, Numerical methods for large eigenvalue problems, revised edition of the 1992 original, Classics in Applied Mathematics, 66, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2011.
18	M. Schmidt, N. Le Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Math. Program. 162 (2017), no. 1-2, Ser. A, 83-112. DOI
19	S. Shalev-Shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss minimization, J. Mach. Learn. Res. 14 (2013), 567-599.
20	O. Shamir, A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate, In The 32nd International Conference on Machine Learning (ICML 2015), 2015.
21	O. Shamir, Fast stochastic algorithms for SVD and PCA: convergence properties and convexity, Preprint arXiv:1507.08788v1, 2015.
22	J. H. Wilkinson, The Algebraic Eigenvalue Problem, Monographs on Numerical Analysis, The Clarendon Press, Oxford University Press, New York, 1988.
23	C. Tan, S. Ma, Y.-H. Dai, and Y. Qian, Barzilai-Borwein step size for stochastic gradient descent, Preprint arXiv:1605.04131v2, 2016.
24	D. S. Watkins, The Matrix Eigenvalue Problem, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2007.
25	Z. Wen, C. Yang, X. Liu, and Y. Zhang, Trace-penalty minimization for large-scale eigenspace computation, J. Sci. Comput. 66 (2016), no. 3, 1175-1203. DOI
26	Z. Xu and Y. Ke, Stochastic variance reduced Riemannian eigensolver, Preprint arXiv:1605.08233v2, 2016.
27	H. Zhang, S. J. Reddi, and S. Sra, Fast stochastic optimization on Riemannian manifolds, Preprint arXiv:1605.07147v2, 2017.