Browse > Article
http://dx.doi.org/10.5351/KJAS.2019.32.5.693

Initialization by using truncated distributions in artificial neural network  

Kim, MinJong (Department of Applied Statistics, Chung-Ang University)
Cho, Sungchul (Department of Applied Statistics, Chung-Ang University)
Jeong, Hyerin (Department of Applied Statistics, Chung-Ang University)
Lee, YungSeop (Department of Statistics, Dongguk University)
Lim, Changwon (Department of Applied Statistics, Chung-Ang University)
Publication Information
The Korean Journal of Applied Statistics / v.32, no.5, 2019 , pp. 693-702 More about this Journal
Abstract
Deep learning has gained popularity for the classification and prediction task. Neural network layers become deeper as more data becomes available. Saturation is the phenomenon that the gradient of an activation function gets closer to 0 and can happen when the value of weight is too big. Increased importance has been placed on the issue of saturation which limits the ability of weight to learn. To resolve this problem, Glorot and Bengio (Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 249-256, 2010) claimed that efficient neural network training is possible when data flows variously between layers. They argued that variance over the output of each layer and variance over input of each layer are equal. They proposed a method of initialization that the variance of the output of each layer and the variance of the input should be the same. In this paper, we propose a new method of establishing initialization by adopting truncated normal distribution and truncated cauchy distribution. We decide where to truncate the distribution while adapting the initialization method by Glorot and Bengio (2010). Variances are made over output and input equal that are then accomplished by setting variances equal to the variance of truncated distribution. It manipulates the distribution so that the initial values of weights would not grow so large and with values that simultaneously get close to zero. To compare the performance of our proposed method with existing methods, we conducted experiments on MNIST and CIFAR-10 data using DNN and CNN. Our proposed method outperformed existing methods in terms of accuracy.
Keywords
initialization; saturation; Xavier initialization; truncated distribution; deep learning; Xavier;
Citations & Related Records
연도 인용수 순위
  • Reference
1 He, K., Zhang, X., Ren, S., and Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1026-1034).
2 Humbird, K. D., Peterson, J. L., and McClarren, R. G. (2018). Deep neural network initialization with decision trees, EEE Transactions on Neural Networks and Learning Systems, 30, 1286-1295.
3 Hayou, S., Doucet, A., and Rousseau, J. (2018). On the selection of initialization and activation function for deep neural networks. arXiv preprint arXiv:1805.08266.
4 Krahenbuhl, P., Doersch, C., Donahue, J., and Darrell, T. (2015). Data-dependent initializations of convolutional neural networks. arXiv preprint arXiv:1511.06856.
5 Krizhevsky, A. and Hinton, G. (2009). Learning Multiple Layers of Features from Tiny Images (Vol. 1, No. 4, p. 7) Technical report, University of Toronto.
6 LeCun, Y., Bottou, L., Orr, G., and Muller, K. (1998a). Efficient backprop in neural networks: Tricks of the trade (Orr, G. and Muller, K., eds.), Lecture Notes in Computer Science, 1524(98), 111.
7 LeCun, Y., Cortes, C., and Burges, C. J. (1998b). The MNIST Database of Handwritten Digits.
8 Mishkin, D. and Matas, J. (2015). All you need is a good init. arXiv preprint arXiv:1511.06422.
9 Sutskever, I., Martens, J., Dahl, G., and Hinton, G. (2013). On the importance of initialization and momentum in deep learning, In International Conference on Machine Learning (pp. 1139-1147).
10 Clevert, D. A., Unterthiner, T., and Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289
11 Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 249-256).
12 Goodfellow, I. J., Vinyals, O., and Saxe, A. M. (2014). Qualitatively characterizing neural network optimization problems. arXiv preprint arXiv:1412.6544.
13 Hanin, B. and Rolnick, D. (2018). How to start training: The effect of initialization and architecture. In Advances in Neural Information Processing Systems (pp. 571-581).