DOI QR코드

DOI QR Code

Pragmatic Assessment of Optimizers in Deep Learning

  • Ajeet K. Jain (Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation) ;
  • PVRD Prasad Rao (CSE, KLEF) ;
  • K. Venkatesh Sharma (CSE, CVR College of Engineering)
  • Received : 2023.10.05
  • Published : 2023.10.30

Abstract

Deep learning has been incorporating various optimization techniques motivated by new pragmatic optimizing algorithm advancements and their usage has a central role in Machine learning. In recent past, new avatars of various optimizers are being put into practice and their suitability and applicability has been reported on various domains. The resurgence of novelty starts from Stochastic Gradient Descent to convex and non-convex and derivative-free approaches. In the contemporary of these horizons of optimizers, choosing a best-fit or appropriate optimizer is an important consideration in deep learning theme as these working-horse engines determines the final performance predicted by the model. Moreover with increasing number of deep layers tantamount higher complexity with hyper-parameter tuning and consequently need to delve for a befitting optimizer. We empirically examine most popular and widely used optimizers on various data sets and networks-like MNIST and GAN plus others. The pragmatic comparison focuses on their similarities, differences and possibilities of their suitability for a given application. Additionally, the recent optimizer variants are highlighted with their subtlety. The article emphasizes on their critical role and pinpoints buttress options while choosing among them.

Keywords

References

  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning MIT Press, USA, 2016. 
  2. Bishop, C.M., Neural Network for Pattern Recognition, Clarendon Press, USA 1995 
  3. Francois Chollet, Deep Learning with Python, Manning Pub., 1st Ed, NY, USA, 2018 
  4. Ajeet K. Jain, Dr. PVRD Prasad Rao and Dr. K Venkatesh Sharma;"A Perspective Analysis of Regularization and Optimization Techniques in Machine Learning", Computational Analysis and Understanding of Deep Learning or Medical Care: Principles, Methods and Applications". CUDLMC 2020 , Wiley-Scrivener, April/May 2021 
  5. John Paul Mueller and Luca Massaron, Deep Learning for Dummies, John Wiley, 2019 
  6. Josh Patterson and Adam Gibson, Deep Learning: A Practitioner's Approach, O'Reilly Pub. Indian Edition, 2017 
  7. Ajeet K. Jain, Dr.PVRD Prasad Rao , Dr. K. Venkatesh Sharma, Deep Learning with Recursive Neural Network for Temporal Logic Implementation, International Journal of Advanced Trends in Computer Science and Engineering,Volume 9, No.4, July - August 2020, pp 6829-6833.  https://doi.org/10.30534/ijatcse/2020/383942020
  8. Srivasatava et al. http://jmlr.org/papers/volume15/srivastava14a.old/srivastava14a.pdf 
  9. Dimitri P. Bertsekas, Convex Optimization Theory, Athena Scientific Pub., MIT Press, USA 2009 
  10. Stephen Boyd and Lieven Vandenberghe, Convex Optimization, Cambridge University Press, USA 2004 
  11. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition.Neural Computation, 1(4):541-551  https://doi.org/10.1162/neco.1989.1.4.541
  12. Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012 
  13. Glorot, X. and Bengio, Y., Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pages 249-256. (2010) 
  14. Glorot, X., Bordes, A., and Bengio, Y, Deep sparse rectifier neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pages 315-323. 2011. 
  15. Zeiler, M. and Fergus, R., Stochastic pooling for regularization of deep convolutional neural networks. In Proceedings of the International Conference on Learning Representations, ICLR, 2013 
  16. Prajit Ramachandran, Barret Zoph, Quoc V. Le, SWISH: A Self-Gated Activation Function, arXiv:1710.05941v1 [cs.NE] 16 Oct 2017
  17. Fabian Latorre, Paul Rolland and Volkan Cevher, Lipschitz Constant Estimation Of Neural Networks Via Sparse Polynomial Optimization, ICLR 2020 
  18. Kavosh Asadi, Dipendra Misra and Michael L. Littman, Lipschitz Continuity in Model-based Reinforcement Learning, Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018 
  19. Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R., Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580. (2012) 
  20. J. Duchi, E. Hazan and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, pp 2121-2159, 2011 
  21. Prabhu CSR, Gandhi R, Jain A K, Lalka VS, Thottempudi SG, Prasada Rao PVRD; "A Novel Approach to Extend KM Models with Object Knowledge Model (OKM) and Kafka for Big Data and Semantic Web with Greater Semantics", Advances in Intelligent Systems and Computing 993, pp.544, 2020 
  22. Bottou, L. Online algorithms and stochastic approximations. In Saad, D., editor, Online Learning and Neural Networks. Cambridge University Press, Cambridge, 1998 
  23. I. Sutskever, J. Martens, G. Dahl and G. Hinton, On importance of initialization and momentum in deep learning, International Conference on Machine Learning, Atlanta, USA, pp. 1139-1147, 2013 
  24. Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k2), Soviet Mathematics Doklady, 27, pp 372-376, 1983 
  25. J. Duchi, E. Hazan and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, pp 2121-2159, 2011
  26. Ajeet K Jain, Dr.PVRD Prasad Rao and Dr.K Venkatesh Sharma;"Extending Description Logics for Semantic Web Ontology Implementation Domains", Test Engineering and Management 83, pp.7385, 2020 
  27. G. Hinton, Neural networks for machine learning, Coursera, video lectures, 2018 
  28. D. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv:1412.6980, 2014. 
  29. S.J. Reddi, S. Kale and S. Kumar, On the convergence of Adam and beyond, International Conference on Learning Representations, Vancouver, Canada, 2018. 
  30. Zaheer, M., Reddi, S., Sachan, D., Kale, S., & Kumar, S. Adaptive methods for nonconvex optimization. Advances in Neural Information Processing Systems (pp. 9793-9803), 2018 
  31. Londhe, A., Prasada Rao, P.V.R.D. "Platforms for big data analytics: Trend towards hybrid era" International Conference on Energy, Communication, Data Analytics and Soft Computing, ICECDS 2017 DOI: 10.1109/ICECDS.2017.8390056 
  32. Hiroaki Hayashi, Jayanth Koushik and Graham Neubig ; Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates , arXiv:1611.01505v3 [cs.LG] 11 Jun 2018 
  33. Liyuan Liu, et al., On The Variance Of The Adaptive Learning Rate And Beyond, arXiv:1908.03265v3 [ cs.LG] 17 Apr 2020 
  34. https://d2l.ai/chapter_optimization/lr-scheduler.html 
  35. Nicola Landro, Ignazio Gallo, Riccardo La Grassa, Mixing ADAM and SGD: a Combined Optimization Method, arXiv:2011.08042v1 [cs.LG] 16 Nov 2020 
  36. Jonathan Frankle and Michael Carbin, The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, arXiv:1803.03635v5 [cs.LG] 4 Mar 2019 
  37. Yadla, H.K., Rao, P.V.R.D.P. "Machine learning based text classifier centered on TF-IDF vectoriser, International Journal of Scientific and Technology Research, 2020 
  38. Varakumari, S., Prasad Rao, P.V.R.D., Sirisha, M., Mohan Rao, K.R.R. MANOVA- A multivariate statistical variance analysis for WSN using PCA 2018 International Journal of Engineering and Technology(UAE) 7, 2018 
  39. Phani Madhuri, N., Meghana, A., Prasada Rao, P.V.R.D., Prem Kumar, P."Ailment prognosis and propose antidote for skin using deep learning", International Journal of Innovative Technology and Exploring Engineering, 2019.