References
- Ciresan D, Meier U, and Schmidhuber J (2012). Multi-column deep neural networks for image classification, arXiv:1202.2745.
- Chollet F (2017). Deep Learning with Python, Manning, New York.
- Falbel D, Allaire JJ, and Chollet F. R interface to 'Keras'. https://keras.rstudio.com/index.html
- Glorot X and Bengio Y (2010). Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 9, 249-256.
- Goodfellow I, Bengio Y, and Courville A (2015). Deep Learning, MIT Press, Cambridge.
- He K, Zhang X, Ren S, and Sun J (2015). Delving deep into rectifiers: surpassing human-level performance on ImageNet classification, arXiv:1502.01852.
- Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, and Salakhutdinov RR (2012). Improving neural networks by preventing co-adaptation of feature detectors, arXiv:1207.0580.
- Hinton G, Srivastava N, and Swersky K (2014). RMSprop: Divide the gradient by a running average of its recent magnitude. Available from: https://www.cs.toronto.edu/-tijmen/csc321/slides/lectureslideslec6.pdf
- Huang Y, Cheng Y, Bapna A, et al. (2018). GPipe: efficient training of giant neural networks using pipeline parallelism, arXiv:1811.06965.
- Ioffe S and Szegedy C (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, 37, 448-456, arXiv:1502.03167.
- Krizhevsky A, Ilya Sutskever, Geoffrey Hinton. (2012). ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 25.
- Krizhevsky A, Nair V, Hinton G. CIFAR10 and CIFAR100 datasets. Available from: https://www.cs.toronto.edu/ kriz/cifar.html
- LeCun Y, Bottou L, Bengio Y, and Haffner P (1998). Gradient-based learning applied to document recognition, Proceedings of the IEEE 86, 2278-2324. https://doi.org/10.1109/5.726791
- LeCun Y, Cortes C, and Burges CJC. MNIST handwritten digit database. Available from: http://yann.lecun.com/exdb/mnist/
- McCulloch WS and Pitts WH (1943). A logical calculus of the ideas immanent in nervous activity, The Bulletin of Mathematical Biophysics, 5, 115-133. https://doi.org/10.1007/BF02478259
- Rosenblatt F (1958). The perceptron: a probabilistic model for information storage and organization in the brain, Psychological Review, 65, 386-408. https://doi.org/10.1037/h0042519
- Ruder S (2017). An overview of gradient descent optimization algorithms, arXiv:1609.04747.
- Rumelhart D, Hinton G, and Williams RJ (1986). Learning representations by back-propagating errors, Nature, 323, 533-536. https://doi.org/10.1038/323533a0
- Simonyan K and Zisserman A (2014). Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556.
- Srivastava N, Hinton G, Krizhevsky A, Sutskever I, and Salakhutdinov R (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research, 15, 1929-1958.
- Zhang Y andWallace B (2015). A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification. arXiv:1510.03820.