Browse > Article
http://dx.doi.org/10.7471/ikeee.2020.24.3.766

Performance Evaluation of Machine Learning Optimizers  

Joo, Gihun (Dept. of Medical Bigdata Convergence, Kangwon National University)
Park, Chihyun (Dept. of Medical Bigdata Convergence, Kangwon National University)
Im, Hyeonseung (Dept. of Medical Bigdata Convergence, Kangwon National University)
Publication Information
Journal of IKEEE / v.24, no.3, 2020 , pp. 766-776 More about this Journal
Abstract
Recently, as interest in machine learning (ML) has increased and research using ML has become active, it is becoming more important to find an optimal hyperparameter combination for various ML models. In this paper, among various hyperparameters, we focused on ML optimizers, and measured and compared the performance of major optimizers using various datasets. In particular, we compared the performance of nine optimizers ranging from SGD, which is the most basic, to Momentum, NAG, AdaGrad, RMSProp, AdaDelta, Adam, AdaMax, and Nadam, using the MNIST, CIFAR-10, IRIS, TITANIC, and Boston Housing Price datasets. Experimental results showed that when Adam or Nadam was used, the loss of various ML models decreased most rapidly and their F1 score was also increased. Meanwhile, AdaMax showed a lot of instability during training and AdaDelta showed slower convergence speed and lower performance than other optimizers.
Keywords
Machine Learning; Deep Learning; Optimizer; Performance Evaluation; Adam; Nadam;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 N. Qian, "On the momentum term in gradient descent learning algorithms," Neural networks, vol.12, no.1, pp.145-151, 1999. DOI: 10.1016/S0893-6080(98)00116-6   DOI
2 Y. E. Nesterov, "A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)," Dokl AN SSSR, vol.269, no.3, pp. 543-547, 1983.
3 J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of machine learning research, vol.12, pp.2121-2159, 2011. DOI: 10.5555/1953048.2021068
4 S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv: 1609.04747, 2016.
5 M. D. Zeiler, "Adadelta: An adaptive learning rate method," arXiv preprint arXiv:1212.5701, 2012.
6 D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv: 1412.6980, 2014.
7 T. Dozat, "Incorporating nesterov momentum into adam," in the 4th International Conference on Learning Representations (ICLR 2016) Workshop track, 2016.
8 D. Choi et al., "On empirical comparisons of optimizers for deep learning," arXiv preprint arXiv: 1910.05446, 2019.
9 M. Mahsa and T. Lee, "Comparison of Optimization Algorithms in Deep Learning-Based Neural Networks for Hydrological Forecasting: Case Study of Nam River Daily Runoff," J. Korean Soc. Hazard Mitig., vol.18, no.6, pp.377-384, 2018. DOI: 10.9798/KOSHAM.2018.18.6.377   DOI
10 W. Jung, B.-S. Lee, and J. Seo, "Performance Comparison of the Optimizers in a Faster R-CNN Model for Object Detection of Metaphase Chromosomes," J. Korea Inst. Inf. Commun. Eng., vol.23, no.11, pp.1357-1363, 2019.
11 K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in the 3rd International Conference on Learning Representations (ICLR 2015), 2015.
12 K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition", in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp.770-778, 2016.