DOI QR코드

DOI QR Code

Performance Improvement Method of Convolutional Neural Network Using Combined Parametric Activation Functions

결합된 파라메트릭 활성함수를 이용한 합성곱 신경망의 성능 향상

  • Received : 2021.12.16
  • Accepted : 2022.03.18
  • Published : 2022.09.30

Abstract

Convolutional neural networks are widely used to manipulate data arranged in a grid, such as images. A general convolutional neural network consists of a convolutional layers and a fully connected layers, and each layer contains a nonlinear activation functions. This paper proposes a combined parametric activation function to improve the performance of convolutional neural networks. The combined parametric activation function is created by adding the parametric activation functions to which parameters that convert the scale and location of the activation function are applied. Various nonlinear intervals can be created according to parameters that convert multiple scales and locations, and parameters can be learned in the direction of minimizing the loss function calculated by the given input data. As a result of testing the performance of the convolutional neural network using the combined parametric activation function on the MNIST, Fashion MNIST, CIFAR10 and CIFAR100 classification problems, it was confirmed that it had better performance than other activation functions.

합성곱 신경망은 이미지와 같은 격자 형태로 배열된 데이터를 다루는데 널리 사용되고 있는 신경망이다. 일반적인 합성곱 신경망은 합성곱층과 완전연결층으로 구성되며 각 층은 비선형활성함수를 포함하고 있다. 본 논문은 합성곱 신경망의 성능을 향상시키기 위해 결합된 파라메트릭 활성함수를 제안한다. 결합된 파라메트릭 활성함수는 활성함수의 크기와 위치를 변환시키는 파라미터를 적용한 파라메트릭 활성함수들을 여러 번 더하여 만들어진다. 여러 개의 크기, 위치를 변환하는 파라미터에 따라 다양한 비선형간격을 만들 수 있으며, 파라미터는 주어진 입력데이터에 의해 계산된 손실함수를 최소화하는 방향으로 학습할 수 있다. 결합된 파라메트릭 활성함수를 사용한 합성곱 신경망의 성능을 MNIST, Fashion MNIST, CIFAR10 그리고 CIFAR100 분류문제에 대해 실험한 결과, 다른 활성함수들보다 우수한 성능을 가짐을 확인하였다.

Keywords

References

  1. Y. Bengio, I. Goodfellow, and A. Courville, "Deep learning," MIT Press, 2017.
  2. C. A. Charu, "Neural Networks and Deep Learning: A Textbook," Springer International Publishing AG, 2018.
  3. Y. M. Ko, P. H. Li, and S. W. Ko, "Performance improvement method of fully connected neural network using combined parametric activation functions," KIPS Transactions on Software and Data Engineering, Vol.11, No.1, pp.1-10, 2022. https://doi.org/10.3745/KTSDE.2022.11.1.1
  4. N. Y. Kong and S. W. Ko, "Performance improvement method of deep neural network using parametric activation functions," Journal of the Korea Contents Association, Vol.21, No.3, pp616-625, 2021. https://doi.org/10.5392/JKCA.2021.21.03.616
  5. N. Y. Kong, Y. M. Ko, and S. W. Ko, "Performance improvement method of convolutional neural network using agileactivation function," KIPS Transactions on Software and Data Engineering, Vol.9, No.7, pp.213-220, 2020. https://doi.org/10.3745/KTSDE.2020.9.7.213
  6. Y. M. Ko and S. W. Ko, "Alleviation of vanishing gradient problem using parametric activation functions," KIPS Transactions on Softward and Data Engineering, Vol.10, No. 10, pp.407-420, 2021.
  7. A. Apicella, F. Donnarumma, F. Isgro, and R. Prevete, "A survey on modern trainable activation functions," Neural Networks, Vol.138, pp.14-32, 2021. https://doi.org/10.1016/j.neunet.2021.01.026
  8. V. Nair and G. Hinton, "Rectified linear units improve restricted boltzmann machines," In Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML), pp.807-814, 2010.
  9. M. Roodschild, J. Gotay Sardinas, and A. Will, "A new approach for the vanishing gradient problem on sigmoid activation," Springer Nature, Vol.20, Iss.4, pp.351-360, 2020.
  10. S. Hochreiter, "The vanishing gradient problem during learning recurrent neural nets and problem solutions," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol.6, No.2, pp.107-116, 1998. https://doi.org/10.1142/S0218488598000094
  11. S. Kong and M. Takatsuka, "Hexpo: A vanishing-proof activation function," International Joint Conference on Neural Networks, pp.2562-2567, 2017.
  12. Y. Qin, X. Wang, and J. Zou, "The optimized deep belief networkswith improved logistic Sigmoid units and their application in faultdiagnosis for planetary gearboxes of wind turbines," IEEE Transactions on Industrial Electronics, Vol.66, No.5, pp.3814-3824, 2018. https://doi.org/10.1109/tie.2018.2856205
  13. X. Wang, Y. Qin, Y. Wang, S. Xiang, and H. Chen, "ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis," Neurocomputing, Vol.363, pp.88-98, 2019. https://doi.org/10.1016/j.neucom.2019.07.017
  14. B. Xu, N. Wang, T. Chen, and M. Li, "Empirical evaluation of rectified activations in convolutional network," arXiv: 1505.00853, 2015.
  15. S. Qian, H. Liu, C. Liu, S. Wu, and H. Wong, "Adaptive activation functions in convolutional neural networks," Neurocomputing, Vol.272, pp.204-212, 2017. https://doi.org/10.1016/j.neucom.2017.06.070
  16. K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," arXiv:1502.01852, 2015.
  17. D. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network learning by exponential linear units (ELUs)," arXiv:1511.07289, 2016.