Browse > Article
http://dx.doi.org/10.4134/BKMS.2016.53.2.589

A MEMORY EFFICIENT INCREMENTAL GRADIENT METHOD FOR REGULARIZED MINIMIZATION  

Yun, Sangwoon (Department of Mathematics Education, Sungkyunkwan University)
Publication Information
Bulletin of the Korean Mathematical Society / v.53, no.2, 2016 , pp. 589-600 More about this Journal
Abstract
In this paper, we propose a new incremental gradient method for solving a regularized minimization problem whose objective is the sum of m smooth functions and a (possibly nonsmooth) convex function. This method uses an adaptive stepsize. Recently proposed incremental gradient methods for a regularized minimization problem need O(mn) storage, where n is the number of variables. This is the drawback of them. But, the proposed new incremental gradient method requires only O(n) storage.
Keywords
incremental gradient method; nonsmooth; regularization; running average;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. P. Bertsekas, A new class of incremental gradient methods for least squares problems, SIAM J. Optim. 7 (1997), no. 4, 913-926.   DOI
2 D. P. Bertsekas, Nonlinear Programming, 2, Athena Scientific, Belmont, MA, 1999.
3 D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Englewood Cliffs, 1989.
4 D. Blatt, A. O. Hero, and H. Gauchman, A convergent incremental gradient method with a constant step size, SIAM J. Optim. 18 (2007), no. 1, 29-51.   DOI
5 P. S. Bradley, U. M. Fayyad, and O. L. Mangasarian, Mathematical programming for data mining: formulations and challenges, INFORMS J. Comput. 11 (1999), no. 3, 217-238.   DOI
6 S. Chen, D. Donoho, and M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1998), no. 1, 33-61.   DOI
7 N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, Cambridge, 2000.
8 I. Daubechies, M. Defrise, and C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure Appl. Math. 57 (2004), no. 11, 1413-1457.   DOI
9 J. Friedman, T. Hastie, and R. Tibshirani, Regularization paths for generalized lienar models via coordinate descent, Report, Department of Statistics, Stanford University, Stanford, May 2009.
10 A. A. Gaivoronski, Convergence properties of back-propagation for neural nets via theory of stochastic gradient methods. Part I, Optim. Methods Softw. 4 (1994), 117-134.   DOI
11 L. Grippo, A class of unconstrained minimization methods for neural network training, Optim. Methods Softw. 4 (1994), 135-150.   DOI
12 C.-H. Ho and C.-J. Lin, Large-scale linear support vector regression, J. Mach. Learn. Res. 13 (2012), 3323-3348.
13 A. Juditsky, G. Lan, A. Nemirovski, and A. Shapiro, Stochastic approximation approach to stochastic programming, SIAM J. Optim. 19 (2009), 1574-1609.   DOI
14 O. L. Mangasarian and D. R. Musicant, Large scale kernel regression via linear pro-gramming, Mach. Learn. 46 (2002), 255-269.   DOI
15 K. Koh, S.-J. Kim, and S. Boyd, An interior-point method for large-scale ℓ1-regularized logistic regression, J. Mach. Learn. Res. 8 (2007), 1519-1555.
16 S. Lee, H. Lee, P. Abeel, and A. Ng, Efficient ${\ell}1$-regularized logistic regression, In Proceedings of the 21st National Conference on Artificial Intelligence, 2006.
17 Z.-Q. Luo and P. Tseng, Analysis of an approximate gradient projection method with applications to the backpropagation algorithm, Optim. Methods Softw. 4 (1994), 85-101.   DOI
18 O. L. Mangasarian and M. V. Solodov, Serial and parallel backpropagation convergence via nonmonotone perturbed minimization, Optim. Methods Softw. 4 (1994), 103-116.   DOI
19 Y. Nesterov, Primal-dual subgradient methods for convex problems, Math. Program. 120 (2009), no. 1, 221-259.   DOI
20 R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1970.
21 D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal representations by error propagation, in Parallel Distributed Processing-Explorations in the Microstructure of Cognition, edited by Rumelhart and McClelland, 318-362, MIT press, Cambridge, 1986.
22 S. Sardy and P. Tseng, AMlet, RAMlet, and GAMlet: automatic nonlinear fitting of additive models, robust and generalized, with wavelets, J. Comput. Graph. Statist. 13 (2004), no. 2, 283-309.   DOI
23 R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B 58 (1996), no. 1, 267-288.
24 V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 2000.
25 P. Tseng, On the rate of convergence of a partially asynchronous gradient projection algorithm, SIAM J. Optim. 1 (1991), no. 4, 603-619.   DOI
26 P. Tseng, and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program. 117 (2009), no. 1-2, 387-423.   DOI
27 P. Tseng, and S. Yun, Incrementally updated gradient methods for constrained and regularized opti-mization, J. Optim. Theory Appl. 160 (2014), no. 3, 832-853.   DOI
28 L. Wang, Efficient regularized solution path algorithms with applications in machine learning and data mining, Ph.D thesis, University of Michigan, 2008.
29 H. White, Learning in artificial neural networks: a statistical perspective, Neural Com-put. 1 (1989), 425-464.   DOI
30 H. White, Some asymptotic results for learning in single hidden-layer feedforward network models, J. Amer. Statist. Assoc. 84 (1989), no. 408, 1003-1013.   DOI
31 L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, J. Mach. Learn. Res. 11 (2010), 2543-2596.