1 |
Duchi, J., Hazan, E. and Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121-2159.
|
2 |
Boyd, S. and Lieven, V. (2004). Convex optimization. Cambridge university press, 466-468.
|
3 |
Bottou, L. (2010). Large-Scale Machine Learning with Stochastic Gradient Descent. Proceedings of COMPSTAT' 2010, 177-186.
|
4 |
Dekel, O., Gilad-Bachrach, R., Shamir, O. and Xiao, L. (2012). Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research, 13, 165-202.
|
5 |
Hwang, C. and Shim, J. (2016). Deep LS-SVM for regression. Journal of the Korean Data & Information Science Society, 27, 827-833.
DOI
|
6 |
Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
|
7 |
Konecny, J., Liu, J., Richtarik, P. and Takac, M. (2016). Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE Journal of Selected Topics in Signal Processing, 10, 242-255.
DOI
|
8 |
LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep learning. Nature, 521, 436-444.
DOI
|
9 |
Lee, W. and Chun, H. (2016). A deep learning analysis of the Chinese Yuans volatility in the onshore and offshore markets. Journal of the Korean Data & Information Science Society, 27, 327-335.
DOI
|
10 |
Li, M., Zhang, T., Chen, Y. and Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.
|
11 |
Zeiler, M. D. (2012). ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701
|
12 |
Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-538.
DOI
|
13 |
Shapiro, A. and Wardi, Y. (1996). Convergence analysis of gradient descent stochastic algorithms. Journal of optimization theory and applications, 91, 439-454.
DOI
|
14 |
Tieleman, T. and Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4.2, 26-31
|
15 |
Yamanishi, K., Takeuchi, J. I., Williams, G. and Milne, P. (2004). On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery, 8, 275-300.
DOI
|