References
- Y. Bengio, I. Goodfellow, and A. Courville, "Deep learning," MIT Press, 2017.
- H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, "Visualizing the Loss Landscape of Neural Nets," arXiv: 1712.09913, 2018.
- S. Hochreiter, "Untersuchungen zu dynamischen neuronalen netzen," Diploma Thesis, Institut fur Informatik, Lehrstuhl Prof. Brauer, Technische Universit atMunchen, 1991.
- S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, "Gradient flow in recurrent nets: The difficulty of learning long-term dependencies," IEEE, 2001.
- X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," Artificial Intelligence and Statistics, Vol.9, 2010.
- V. Nair and G. Hinton, "Rectified linear units improve restricted boltzmann machines," International Conference on Machine Learning, pp.807-814, 2010.
- N. Y. Kong, Y. M. Ko, and S. W. Ko, "Performance Improvement Method of Convolutional Neural Network Using Agile Activation Function," KIPS Transactions on Software and Data Engineering, Vol.9, No.7, pp.213-220, 2020. https://doi.org/10.3745/KTSDE.2020.9.7.213
- S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, Vol.9, No.8, pp.1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
- F. A. Gers, J. Schmidhuber, and F. Cummins, "Learning to forget: Continual prediction with LSTM," Neural Computation, Vol.12, No.10, pp.2451-2471, 2000, https://doi.org/10.1162/089976600300015015
- J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv:1412.3555, 2014.
- K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," arXiv:1502.01852, 2015.
- G. E. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Computation, Vol.18, No.7, pp.1527-1554, 2006. https://doi.org/10.1162/neco.2006.18.7.1527
- J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methodsfor online learning and stochastic optimization," The Journal of Machine Learning Research, Vol.12, No.61, pp.2121-2159, 2011.
- M. D. Zeiler, "ADADELTA: An adaptive learning ratemethod," arXiv:1212.5701, 2012.
- D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv:1412.6980, 2014.
- S. Kong and M. Takatsuka, "Hexpo: A vanishing-proof activation function," International Joint Conference on Neural Networks, pp.2562-2567, 2017.
- Y. Qin, X. Wang, and J. Zou, "The optimized deep belief networkswith improved logistic Sigmoid units and their application in faultdiagnosis for planetary gearboxes of wind turbines," Institute of Electrical and Electronics Engineers, Vol.66, No.5, pp.3814-3824, 2018.
- X. Wang, Y. Qin, Y. Wang, S. Xiang, and H. Chen, "ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis," Neurocomputing, Vol.363, pp.88-98, 2019. https://doi.org/10.1016/j.neucom.2019.07.017
- R. Pascanu, T. Mikolov, and Y. Bengio, "Understanding the exploding gradient problem," arXiv:1211.5063, 2012.
- R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," arXiv:1211.5063, 2013.
- B. Xu, N. Wang, T. Chen, and M. Li, "Empirical Evaluation of Rectified Activations in Convolution Networkm," arXiv: 1505.00853, 2015.
- S. Basodi, C. Ji, H. Zhang, and Y. Pan, "Gradient amplification: An efficient way to train deep neural networks," Big Data mining and Analytics, Vol.3, No.3, pp.196-207, 2020. https://doi.org/10.26599/bdma.2020.9020004