DOI QR코드

DOI QR Code

Beta and Alpha Regularizers of Mish Activation Functions for Machine Learning Applications in Deep Neural Networks

  • Mathayo, Peter Beatus (Department of Computer Engineering, Graduate School, Dongseo University) ;
  • Kang, Dae-Ki (Department of Computer Engineering, Dongseo University)
  • Received : 2021.12.11
  • Accepted : 2021.12.20
  • Published : 2022.02.28

Abstract

A very complex task in deep learning such as image classification must be solved with the help of neural networks and activation functions. The backpropagation algorithm advances backward from the output layer towards the input layer, the gradients often get smaller and smaller and approach zero which eventually leaves the weights of the initial or lower layers nearly unchanged, as a result, the gradient descent never converges to the optimum. We propose a two-factor non-saturating activation functions known as Bea-Mish for machine learning applications in deep neural networks. Our method uses two factors, beta (𝛽) and alpha (𝛼), to normalize the area below the boundary in the Mish activation function and we regard these elements as Bea. Bea-Mish provide a clear understanding of the behaviors and conditions governing this regularization term can lead to a more principled approach for constructing better performing activation functions. We evaluate Bea-Mish results against Mish and Swish activation functions in various models and data sets. Empirical results show that our approach (Bea-Mish) outperforms native Mish using SqueezeNet backbone with an average precision (AP50val) of 2.51% in CIFAR-10 and top-1accuracy in ResNet-50 on ImageNet-1k. shows an improvement of 1.20%.

Keywords

Acknowledgement

This work was supported by Dongseo University, "Dongseo Cluster Project" Research Fund of 2021 (DSU-20210001).

References

  1. J. Kilian and H. Siegelmann, "On the Power of Sigmoid Neural Networks," in Proc. 6th Annual Conference on Computational Learning Theory, pp. 137-143, August 1993. DOI: https://doi.org/10.1145/168304.168321
  2. V. Nair and G. E. Hinton, "Rectified Linear Units Improve Restricted Boltzmann Machines," in Proc. 27th International Conference on Machine Learning, pp. 807-814, June 2010.
  3. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Communications of the ACM, Vol. 60, No. 6, pp. 84-90, June 2017. DOI: https://doi.org/10.1145/3065386
  4. A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier Nonlinearities Improve Neural Network Acoustic Models," in Proc. 30th ICML Workshop on Deep Learning for Audio, Speech, and Language Processing (WDLASL 2013), June 16-21, 2013.
  5. K. He, X. Zhang, S. Ren, and J Sun, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification," in Proc. International Conference on Computer Vision (ICCV 2015), pp. 1026-1034, Dec 7-13, 2015. DOI: https://doi.org/10.1109/ICCV.2015.123
  6. D.-A. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)," in Proc. 4th International Conference on Learning Representations (Poster), May 2016.
  7. G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, "Self-Normalizing Neural Networks," in Proc. Advances in Neural Information Processing Systems 30 (NIPS 2017), Dec 4-9, 2017.
  8. D. Hendrycks and K. Gimpel, "Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units," CoRR, abs/1606.08415, 2016.
  9. P. Ramachandran, B. Zoph, and Q. V. Le, "Searching for Activation Functions," CoRR, abs/1710.05941, 2017.
  10. B. Zoph, and Q. V. Le, "Neural Architecture Search with Reinforcement Learning," in Proc. 5th International Conference on Learning Representations, April 24-26, 2017.
  11. A. Howard et al., "Searching for MobileNetV3," in Proc. the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314-1324, Oct. 27-Nov. 2, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00140
  12. Y. Ying et al., "Rectified Exponential Units for Convolutional Neural Networks," IEEE Access, Vol. 7, pp. 101633-101640, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2928442
  13. D. Misra, "Mish: A Self Regularized Non-Monotonic Activation Function," in Proc. 31st British Machine Vision Virtual Conference (BMVC), Sep. 7-10, 2020.
  14. X. Glorot, and Y. Bengio, "Understanding the Difficulty of Training Deep Feed Forward Neural Networks," in Proc. 13th International Conference on Artificial Intelligence and Statistics (AISTAT), May 13-15, 2010.
  15. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, June 27-30, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90
  16. J. Hu, L. Shen, and G. Sun, "Squeeze-and-Excitation Networks," in Proc. IEEE Conference on Computer Vision and Pattern Recognitio (CVPR)n, pp. 7132-7141, June 18-22, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00745
  17. A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Master Thesis. University of Toronto, Canada, 2009.
  18. J. Deng et al., "ImageNet: a Large-Scale Hierarchical Image Database," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248-255, June 20-25, 2009. DOI: https://doi.org/10.1109/CVPR.2009.5206848
  19. X. Du et al., "SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11589-11598, June 13-19, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.01161
  20. C.-Y. Wang et al., "CSPNet: A New Backbone that can Enhance Learning Capability of CNN," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1571-1580, June 14-19, 2020. DOI: https://doi.org/10.1109/CVPRW50498.2020.00203
  21. L. Bottou, "Large-Scale Machine Learning with Stochastic Gradient Descent," in Proc. 19th International Conference on Computational Statistics (COMPSTAT 2010), pp. 177-186, Aug. 22-27, 2010. DOI: https://doi.org/10.1007/978-3-7908-2604-3_16
  22. S. E. Budiman and S. Lee, "Object Tracking with Histogram weighted Centroid augmented Siamese Region Proposal Network," International Journal of Internet, Broadcasting and Communication, Vol.13 No.2, pp. 156-165, May 2021. DOI: http://dx.doi.org/10.7236/IJIBC.2021.13.2.156