References
- Rivera, J., "Gartner Reveals Top Predictions for IT Organizations and Users for 2014 and Beyond," Gartner, 2013. http://www.gartner.com/newsroom/id/2603215
- Woods, V., "Gartner Identifies the Top 10 Strategic Technology Trends for 2016," Gartner, 2015. http://www.gartner.com/newsroom/id/3143521
- Google Scholar. https://scholar.google.com/
- McCulloch, W. S. and Pitts, W., "A logical calculus of the ideas immanent in nervous activity," Bulletin of Mathematical Biophysics, vol. 5, no. 4, pp. 115-133, 1943. https://doi.org/10.1007/BF02478259
- LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D., "Backpropagation applied to handwritten zip code recognition," Neural Computation, vol. 1, no. 4, pp. 541-551, 1989. https://doi.org/10.1162/neco.1989.1.4.541
- Birdsall, J. W., "The Sun Hardware Reference," 1995. http://www.sunhelp.org/faq/sunrefl.html
- "NVIDIA Tesla P100," NVIDIA Whitepaper, 2016. https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf
- Min, S., Lee, B., and Yoon, S., "Deep Learning in Bioinformatics," arXiv preprint arXiv:1603.06430, 2016.
- Fehrer, R. and Feuerriegel, S., "Improving Decision Analytics with Deep Learning: The Case of Financial Disclosures," arXiv preprint arXiv:1508.01993, 2015.
- Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T., "Caffe: Convolutional Architecture for Fast Feature Embedding," Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675-678, 2014.
- Abadi, M. et al., "TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems," arXiv preprint arXiv:1603.04467, 2016
- Bergstra, J., Bastien, F., Breuleux, O., Lamblin, P., Pascanu, R., Delalleau, O., Desjardins, G., Warde-Farley, D., Goodfellow, I., Bergeron, A., and Bengio, Y., "Theano: Deep Learning on GPUs with Python," Journal of Machine Learning Research, vol. 1, pp. 1-48, 2011.
- Torch: A scientific computing framework for LuaJIT. http://torch.ch/
- ImageNet. http://image-net.org/
- Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., and Tran, J., "cuDNN: Efficient Primitives for Deep Learning," arXiv preprint arXiv:1410.0759, 2014.
- Mathieu, M., Mikael H., and LeCun, Y., "Fast Training of Convolutional Networks through FFTs," arXiv preprint arXiv:1312.5851, 2013.
- Jouppi, N., "Google supercharges machine learning tasks with TPU custom chip," Google Cloud Platform Blog, 2016. https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html
- Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N., and Temam, O., "DaDianNao: A Machine-Learning Supercomputer," Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609-622, 2014.
- Lacey, G., Taylor, G. W., and Areibi, S., "Deep Learning on FPGAs: Past, Present, and Future," arXiv preprint arXiv: 1602.04283, 2016.
- Ovtcharov, K., Ruwase, O., Fowers, J., Strauss, K., and Chung, E., "Accelerating Deep Convolutional Neural Networks Using Specialized Hardware," Microsoft Research Whitepaper, 2015. https://www.microsoft.com/en-us/research/publication/accelerating-deep-convolutional-neural-networks-using-specalized-hardware/
- Dean, J., Corrado, G. S., Monga, R., Chen, K., Devin, M., Le, Q. V., Mao, M. Z., Ranzato, M., Senior, A., Tucker, P., Yang, K., and Ng, A. Y. "Large Scale Distributed Deep Networks," Advances in Neural Information Processing Systems, vol. 25, pp. 1232-1240, 2012.
- Tallada, M. G., "Coarse Grain Parallelization of Deep Neural Networks," Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Article no. 1, 2016.
- Wu, R., Yan, S., Shan, Y., Dang, Q., and Sun, G. "Deep Image: Scaling up Image Recognition," arXiv preprint arXiv: 1501.02876, 2015.
- Adhikari, R., "Google, Movidius to Bring Deep Learning to Mobile Devices," Tech News World, 2016. http://www.technewsworld.com/story/83052.html
- Qualcomm Zeroth Platform. https://www.qualcomm.com/invention/cognitive-technologies/zeroth
- "GPU-Based Deep Learning Inference: A Performance and Power Analysis," NVIDIA Whitepaper, 2015. https://www.nvidia.com/content/tegra!embedded-systems/pdf/jetson_tx1_whitepaper.pdf
- Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., and Dally, W. J., "EIE: Efficient Inference Engine on Compressed Deep Neural Network," arXiv preprint arXiv:1602.01528, 2016.
- Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S. K., Hernandez-Lobato, J. M., Wei, G.-Y, and Brooks, D., "Minerva: Enabling Low-Power, Highly-Accurate Deep Nerual Network Accelerators," Proceedings of the 43rd International Symposium on Computer Architecture, 2016.
- LiKamWa, R., Hou, Y, Gao, J., Polansky, M., and Zhong, L., "RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision," Proceedings of the 43rd International Symposium on Computer Architecture, 2016.
- Caffe tutorial. http://caffe.berkeleyvision.org/tutorial/layers.html
- Krizhevsky, A., Sutskever, I., and Hinton, G. E., "ImageNet Classification with Deep Convolutional Neural Networks," Advances in Neural Information Processing Systems, 2012.
- Lavin, A. and Gray, S., "Fast Algorithms for Convolutional Neural Networks," arXiv preprint arXiv: 1509.09308, 2015.
- Smith, C., Nguyen, C. and De. U., "Distributed Tensor Flow: Scaling Google's Deep Learning Library on Spark," ARIMO, 2016. https://mimo.com/machine-learning/deepleaming/2016/arimo-distributed-tensorflow-on-spark/
- Vishnu, A., Siegel, C., and Daily, J., "Distributed TensorFlow with MPI," arXiv preprint arXiv: 1603.02339, 2016.
- Multi node caffe. https://github.com/BVLC/caffe/pull/3441
- Elephas: Distributed Deep learning with Keras & Spark. https://github.com/maxpumperla/elephas/
- IPC. https://github.com/twitter/torch-ipc
- DistLearn. https://github.com/twitter/torch-distlearn
- Using the GPU - Theano 0.8.2 documentation. http://deeplearning.net/software/theano/tutorial/using_gpu.html
- cltorch. https://github.com/hughperkins/cltorch
- OpenCL Caffe. https://github.com/BVLC/caffe/tree/opencl
- tensorflow-opencl. https://github.com/benoitsteiner/tensorflow-opencl
- OpenCL. https://www.khronos.org/opencl/
- Song, F. and Dongarra, J. "A Scalable Framework for Heterogeneous GPU-Based Clusters," Proceedings of the twenty-fourth annual ACM symposium on parallelism in algorithms and architectures, pp. 91-100, 2012.
- Dean, J. and Ghemawat, S. "Map Reduce: Simplified Data Processing on Large Clusters," Communications of the ACM, vol 51, no. 1, pp.107-113, 2008. https://doi.org/10.1145/1327452.1327492
- Petitet, A., Whaley, R. C., Dongarra, J., and Cleary, A., "HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed Memory Computers," 2016. http://www.netlib.org/benchmark/hpl
- Kim, J., Jo, G., Jung, J., Kim, J., and Lee, J., "A Distributed OpenCL Framework using Redundant Computation and Data Replication," Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 553-569, 2016.
- Kim, J., Seo, S., Lee, J., Nah, J., Jo, G., and Lee, J., "SnuCL: An OpenCL Framework for Heterogeneous CPU/GPU Clusters," Proceedings of the 26th ACM International Conference on Supercomputing, pp. 341-351, 2012.