1 |
D. P. Bertsekas, A new class of incremental gradient methods for least squares problems, SIAM J. Optim. 7 (1997), no. 4, 913-926.
DOI
|
2 |
D. P. Bertsekas, Nonlinear Programming, 2, Athena Scientific, Belmont, MA, 1999.
|
3 |
D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Englewood Cliffs, 1989.
|
4 |
D. Blatt, A. O. Hero, and H. Gauchman, A convergent incremental gradient method with a constant step size, SIAM J. Optim. 18 (2007), no. 1, 29-51.
DOI
|
5 |
P. S. Bradley, U. M. Fayyad, and O. L. Mangasarian, Mathematical programming for data mining: formulations and challenges, INFORMS J. Comput. 11 (1999), no. 3, 217-238.
DOI
|
6 |
S. Chen, D. Donoho, and M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1998), no. 1, 33-61.
DOI
|
7 |
N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, Cambridge, 2000.
|
8 |
I. Daubechies, M. Defrise, and C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure Appl. Math. 57 (2004), no. 11, 1413-1457.
DOI
|
9 |
J. Friedman, T. Hastie, and R. Tibshirani, Regularization paths for generalized lienar models via coordinate descent, Report, Department of Statistics, Stanford University, Stanford, May 2009.
|
10 |
A. A. Gaivoronski, Convergence properties of back-propagation for neural nets via theory of stochastic gradient methods. Part I, Optim. Methods Softw. 4 (1994), 117-134.
DOI
|
11 |
L. Grippo, A class of unconstrained minimization methods for neural network training, Optim. Methods Softw. 4 (1994), 135-150.
DOI
|
12 |
C.-H. Ho and C.-J. Lin, Large-scale linear support vector regression, J. Mach. Learn. Res. 13 (2012), 3323-3348.
|
13 |
A. Juditsky, G. Lan, A. Nemirovski, and A. Shapiro, Stochastic approximation approach to stochastic programming, SIAM J. Optim. 19 (2009), 1574-1609.
DOI
|
14 |
O. L. Mangasarian and D. R. Musicant, Large scale kernel regression via linear pro-gramming, Mach. Learn. 46 (2002), 255-269.
DOI
|
15 |
K. Koh, S.-J. Kim, and S. Boyd, An interior-point method for large-scale ℓ1-regularized logistic regression, J. Mach. Learn. Res. 8 (2007), 1519-1555.
|
16 |
S. Lee, H. Lee, P. Abeel, and A. Ng, Efficient -regularized logistic regression, In Proceedings of the 21st National Conference on Artificial Intelligence, 2006.
|
17 |
Z.-Q. Luo and P. Tseng, Analysis of an approximate gradient projection method with applications to the backpropagation algorithm, Optim. Methods Softw. 4 (1994), 85-101.
DOI
|
18 |
O. L. Mangasarian and M. V. Solodov, Serial and parallel backpropagation convergence via nonmonotone perturbed minimization, Optim. Methods Softw. 4 (1994), 103-116.
DOI
|
19 |
Y. Nesterov, Primal-dual subgradient methods for convex problems, Math. Program. 120 (2009), no. 1, 221-259.
DOI
|
20 |
R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1970.
|
21 |
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal representations by error propagation, in Parallel Distributed Processing-Explorations in the Microstructure of Cognition, edited by Rumelhart and McClelland, 318-362, MIT press, Cambridge, 1986.
|
22 |
S. Sardy and P. Tseng, AMlet, RAMlet, and GAMlet: automatic nonlinear fitting of additive models, robust and generalized, with wavelets, J. Comput. Graph. Statist. 13 (2004), no. 2, 283-309.
DOI
|
23 |
R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B 58 (1996), no. 1, 267-288.
|
24 |
V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 2000.
|
25 |
P. Tseng, On the rate of convergence of a partially asynchronous gradient projection algorithm, SIAM J. Optim. 1 (1991), no. 4, 603-619.
DOI
|
26 |
P. Tseng, and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program. 117 (2009), no. 1-2, 387-423.
DOI
|
27 |
P. Tseng, and S. Yun, Incrementally updated gradient methods for constrained and regularized opti-mization, J. Optim. Theory Appl. 160 (2014), no. 3, 832-853.
DOI
|
28 |
L. Wang, Efficient regularized solution path algorithms with applications in machine learning and data mining, Ph.D thesis, University of Michigan, 2008.
|
29 |
H. White, Learning in artificial neural networks: a statistical perspective, Neural Com-put. 1 (1989), 425-464.
DOI
|
30 |
H. White, Some asymptotic results for learning in single hidden-layer feedforward network models, J. Amer. Statist. Assoc. 84 (1989), no. 408, 1003-1013.
DOI
|
31 |
L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, J. Mach. Learn. Res. 11 (2010), 2543-2596.
|