DOI QR코드

DOI QR Code

The Effect of regularization and identity mapping on the performance of activation functions

정규화 및 항등사상이 활성함수 성능에 미치는 영향

  • Received : 2017.07.27
  • Accepted : 2017.10.13
  • Published : 2017.10.31

Abstract

In this paper, we describe the effect of the regularization method and the network with identity mapping on the performance of the activation functions in deep convolutional neural networks. The activation functions act as nonlinear transformation. In early convolutional neural networks, a sigmoid function was used. To overcome the problem of the existing activation functions such as gradient vanishing, various activation functions were developed such as ReLU, Leaky ReLU, parametric ReLU, and ELU. To solve the overfitting problem, regularization methods such as dropout and batch normalization were developed on the sidelines of the activation functions. Additionally, data augmentation is usually applied to deep learning to avoid overfitting. The activation functions mentioned above have different characteristics, but the new regularization method and the network with identity mapping were validated only using ReLU. Therefore, we have experimentally shown the effect of the regularization method and the network with identity mapping on the performance of the activation functions. Through this analysis, we have presented the tendency of the performance of activation functions according to regularization and identity mapping. These results will reduce the number of training trials to find the best activation function.

본 논문에서는 딥러닝에서 활용되는 정규화(regularization) 및 항등사상(identity mapping)이 활성함수(activation function) 성능에 미치는 영향에 대해 설명한다. 딥러닝에서 활성함수는 비선형 변환을 위해 사용된다. 초기에는 sigmoid 함수가 사용되었으며, 기울기가 사라지는 기존의 활성함수의 문제점을 극복하기 위해 ReLU(Rectified Linear Unit), LReLU(Leaky ReLU), PReLU(Parametric ReLU), ELU(Exponetial Linear Unit)이 개발되었다. 활성함수와의 연구와는 별도로 과적합(Overfitting)문제를 해결하기 위해, Dropout, 배치 정규화(Batch normalization) 등의 정규화 방법들이 개발되었다. 추가적으로 과적합을 피하기 위해, 일반적으로 기계학습 분야에서 사용되는 data augmentation 기법이 활용된다. 딥러닝 구조의 측면에서는 기존에 단순히 컨볼루션(Convolution) 층을 쌓아올리는 구조에서 항등사상을 추가하여 순방향, 역방향의 신호흐름을 개선한 residual network가 개발되었다. 위에서 언급된 활성함수들은 각기 서로 다른 특성을 가지고 있으나, 새로운 정규화 및 딥러닝 구조 연구에서는 가장 많이 사용되는 ReLU에 대해서만 검증되었다. 따라서 본 논문에서는 정규화 및 항등사상에 따른 활성함수의 성능에 대해 실험적으로 분석하였다. 분석을 통해, 정규화 및 항등사상 유무에 따른 활성함수 성능의 경향을 제시하였으며, 이는 활성함수 선택을 위한 교차검증 횟수를 줄일 수 있을 것이다.

Keywords

References

  1. Srivastava, Nitish, et al., "Dropout: a simple way to prevent neural networks from overfitting," Journal of Machine Learning Research 15.1, pp. 1929-1958, 2014.
  2. Wan, Li, et al., "Regularization of neural networks using dropconnect." Proceedings of the 30th international conference on machine learning (ICML-13), 2013.
  3. Ioffe, Sergey, and Christian Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International Conference on Machine Learning, 2015.
  4. He, Kaiming, et al., "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90
  5. Nair, Vinod, and Geoffrey E. Hinton, "Rectified linear units improve restricted boltzmann machines." Proceedings of the 27th international conference on machine learning (ICML-10), 2010.
  6. Maas, Andrew L., Awni Y. Hannun, and Andrew Y. Ng., "Rectifier nonlinearities improve neural network acoustic models." Proc. ICML, vol. 30, no. 1, 2013.
  7. He, Kaiming, et al., "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision, 2015. DOI: https://doi.org/10.1109/ICCV.2015.123
  8. Clevert, Djork-Arne, Thomas Unterthiner, and Sepp Hochreiter, "Fast and accurate deep network learning by exponential linear units (elus)," arXiv preprint arXiv: 1511.07289, 2015.
  9. Simonyan, Karen, and Andrew Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.