1 |
N. Qian, "On the momentum term in gradient descent learning algorithms," Neural networks, vol.12, no.1, pp.145-151, 1999. DOI: 10.1016/S0893-6080(98)00116-6
DOI
|
2 |
Y. E. Nesterov, "A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)," Dokl AN SSSR, vol.269, no.3, pp. 543-547, 1983.
|
3 |
J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of machine learning research, vol.12, pp.2121-2159, 2011. DOI: 10.5555/1953048.2021068
|
4 |
S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv: 1609.04747, 2016.
|
5 |
M. D. Zeiler, "Adadelta: An adaptive learning rate method," arXiv preprint arXiv:1212.5701, 2012.
|
6 |
D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv: 1412.6980, 2014.
|
7 |
T. Dozat, "Incorporating nesterov momentum into adam," in the 4th International Conference on Learning Representations (ICLR 2016) Workshop track, 2016.
|
8 |
D. Choi et al., "On empirical comparisons of optimizers for deep learning," arXiv preprint arXiv: 1910.05446, 2019.
|
9 |
M. Mahsa and T. Lee, "Comparison of Optimization Algorithms in Deep Learning-Based Neural Networks for Hydrological Forecasting: Case Study of Nam River Daily Runoff," J. Korean Soc. Hazard Mitig., vol.18, no.6, pp.377-384, 2018. DOI: 10.9798/KOSHAM.2018.18.6.377
DOI
|
10 |
W. Jung, B.-S. Lee, and J. Seo, "Performance Comparison of the Optimizers in a Faster R-CNN Model for Object Detection of Metaphase Chromosomes," J. Korea Inst. Inf. Commun. Eng., vol.23, no.11, pp.1357-1363, 2019.
|
11 |
K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in the 3rd International Conference on Learning Representations (ICLR 2015), 2015.
|
12 |
K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition", in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp.770-778, 2016.
|