Acknowledgement
This work was supported by the National Research Foundation of Korea (NRF-2017R1E1A1A03070311)
References
- I. Goodfellow, Y. Bengio, & A. Courville. (2016). Regularization for deep learning. In Deep Learning. Cambridge : MIT Press.
- H. Zhao, Y. H. Tsai, R. Salakhutdinov, & G. J. Gordon. (2019). Learning Neural Networks with Adaptive Regularization. arXiv:1907.06288v2.
- Y. Zheng, R. Zhang, & Y. Mao. (2021). Regularizing Neural Networks via Adversarial Model Perturbation. arXiv:2010.04925v4.
- Y. Wang, Z. P. Bian, J. Hou, & L. P. Chau. (2021). Convolutional Neural Networks With Dynamic Regularization. IEEE Transactions on Neural Networks and Learning Systems, 32(5), 2299-2304. https://doi.org/10.1109/TNNLS.2020.2997044
- J. Duchi, E. Hazan, & Y. Singer. (2011). Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121-2159.
- Y. LeCun, L. Bottou, Y. Bengio, & P. Haffner. (1998). Gradient-based learning applied to document recognition. Proc. IEEE, 86, 2278-2324. https://doi.org/10.1109/5.726791
- R. Pascanu, & Y. Bengio. (2013). Revisiting natural gradient for deep networks. arXiv:1301.3584.
- J. Sohl-Dickstein, B. Poole, & S. Ganguli. (2014). Fast large-scale optimization by unifying stochastic gradient and quasi-newton methods. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), Beijing, China, 21-26 June, 604-612.
- S. Chaudhury, & T. Yamasaki. (2021). Robustness of Adaptive Neural Network Optimization Under Training Noise. IEEE Access, 9, 37039-37053. https://doi.org/10.1109/ACCESS.2021.3062990
- R. He, L. Liu, H. Ye, Q. Tan, B. Ding, L. Cheng, J. W. Low, L. Bing, & L. Si. (2021). On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation. arXiv:2106.03164v1.
- C. T. Kelley. (1995). Iterative methods for linear and nonlinear equations. In Frontiers in Applied Mathematics; SIAM: Philadelphia, PA, USA. Volume 16.
- H. Zulkifli. (2018). Understanding Learning Rates and How It Improves Performance in Deep Learning. https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10
- S. Lau. (2017). Learning Rate Schedules and Adaptive Learning Rate Methods for Deep Learning. Towards Data Science. https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1.
- G. Aurelien. (2017). Gradient Descent. Hands-On Machine Learning with Scikit-Learn and TensorFlow. O'Reilly. pp. 113-124. ISBN 978-1-4919-6229-9.
- I. Sutskever, J. Martens, G. Dahl, & G.E. Hinton. (2013). On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), Atlanta, GA, USA, 16-21 June, 1139-1147.
- T. Tieleman, & G.E. Hinton. (2012). Lecture 6.5-RMSProp, COURSERA: Neural Networks for Machine Learning. Technical Report, University of Toronto, Toronto, ON, Canada.
- M. D. Zeiler. (2012). Adadelta: An adaptive learning rate method. arXiv:1212.5701.
- D. P. Kingma, & J. L. Ba. (2015). Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference for Learning Representations, ICLR 2015, San Diego, CA, USA, 7-9 May
- M. J. Kochenderfer, & T. A. Wheeler. (2019). Algorithms for Optimization. Cambridge: The MIT Press.
- K. He, X. Zhang, S. Ren, & J. Sun. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27-30 June, 770-778.