딥 러닝을 위한 HW 시스템 및 SW 라이브러리

  • Published : 2016.09.26

Abstract

Keywords

References

  1. Rivera, J., "Gartner Reveals Top Predictions for IT Organizations and Users for 2014 and Beyond," Gartner, 2013. http://www.gartner.com/newsroom/id/2603215
  2. Woods, V., "Gartner Identifies the Top 10 Strategic Technology Trends for 2016," Gartner, 2015. http://www.gartner.com/newsroom/id/3143521
  3. Google Scholar. https://scholar.google.com/
  4. McCulloch, W. S. and Pitts, W., "A logical calculus of the ideas immanent in nervous activity," Bulletin of Mathematical Biophysics, vol. 5, no. 4, pp. 115-133, 1943. https://doi.org/10.1007/BF02478259
  5. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D., "Backpropagation applied to handwritten zip code recognition," Neural Computation, vol. 1, no. 4, pp. 541-551, 1989. https://doi.org/10.1162/neco.1989.1.4.541
  6. Birdsall, J. W., "The Sun Hardware Reference," 1995. http://www.sunhelp.org/faq/sunrefl.html
  7. "NVIDIA Tesla P100," NVIDIA Whitepaper, 2016. https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf
  8. Min, S., Lee, B., and Yoon, S., "Deep Learning in Bioinformatics," arXiv preprint arXiv:1603.06430, 2016.
  9. Fehrer, R. and Feuerriegel, S., "Improving Decision Analytics with Deep Learning: The Case of Financial Disclosures," arXiv preprint arXiv:1508.01993, 2015.
  10. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T., "Caffe: Convolutional Architecture for Fast Feature Embedding," Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675-678, 2014.
  11. Abadi, M. et al., "TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems," arXiv preprint arXiv:1603.04467, 2016
  12. Bergstra, J., Bastien, F., Breuleux, O., Lamblin, P., Pascanu, R., Delalleau, O., Desjardins, G., Warde-Farley, D., Goodfellow, I., Bergeron, A., and Bengio, Y., "Theano: Deep Learning on GPUs with Python," Journal of Machine Learning Research, vol. 1, pp. 1-48, 2011.
  13. Torch: A scientific computing framework for LuaJIT. http://torch.ch/
  14. ImageNet. http://image-net.org/
  15. Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., and Tran, J., "cuDNN: Efficient Primitives for Deep Learning," arXiv preprint arXiv:1410.0759, 2014.
  16. Mathieu, M., Mikael H., and LeCun, Y., "Fast Training of Convolutional Networks through FFTs," arXiv preprint arXiv:1312.5851, 2013.
  17. Jouppi, N., "Google supercharges machine learning tasks with TPU custom chip," Google Cloud Platform Blog, 2016. https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html
  18. Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N., and Temam, O., "DaDianNao: A Machine-Learning Supercomputer," Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609-622, 2014.
  19. Lacey, G., Taylor, G. W., and Areibi, S., "Deep Learning on FPGAs: Past, Present, and Future," arXiv preprint arXiv: 1602.04283, 2016.
  20. Ovtcharov, K., Ruwase, O., Fowers, J., Strauss, K., and Chung, E., "Accelerating Deep Convolutional Neural Networks Using Specialized Hardware," Microsoft Research Whitepaper, 2015. https://www.microsoft.com/en-us/research/publication/accelerating-deep-convolutional-neural-networks-using-specalized-hardware/
  21. Dean, J., Corrado, G. S., Monga, R., Chen, K., Devin, M., Le, Q. V., Mao, M. Z., Ranzato, M., Senior, A., Tucker, P., Yang, K., and Ng, A. Y. "Large Scale Distributed Deep Networks," Advances in Neural Information Processing Systems, vol. 25, pp. 1232-1240, 2012.
  22. Tallada, M. G., "Coarse Grain Parallelization of Deep Neural Networks," Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Article no. 1, 2016.
  23. Wu, R., Yan, S., Shan, Y., Dang, Q., and Sun, G. "Deep Image: Scaling up Image Recognition," arXiv preprint arXiv: 1501.02876, 2015.
  24. Adhikari, R., "Google, Movidius to Bring Deep Learning to Mobile Devices," Tech News World, 2016. http://www.technewsworld.com/story/83052.html
  25. Qualcomm Zeroth Platform. https://www.qualcomm.com/invention/cognitive-technologies/zeroth
  26. "GPU-Based Deep Learning Inference: A Performance and Power Analysis," NVIDIA Whitepaper, 2015. https://www.nvidia.com/content/tegra!embedded-systems/pdf/jetson_tx1_whitepaper.pdf
  27. Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., and Dally, W. J., "EIE: Efficient Inference Engine on Compressed Deep Neural Network," arXiv preprint arXiv:1602.01528, 2016.
  28. Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S. K., Hernandez-Lobato, J. M., Wei, G.-Y, and Brooks, D., "Minerva: Enabling Low-Power, Highly-Accurate Deep Nerual Network Accelerators," Proceedings of the 43rd International Symposium on Computer Architecture, 2016.
  29. LiKamWa, R., Hou, Y, Gao, J., Polansky, M., and Zhong, L., "RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision," Proceedings of the 43rd International Symposium on Computer Architecture, 2016.
  30. Caffe tutorial. http://caffe.berkeleyvision.org/tutorial/layers.html
  31. Krizhevsky, A., Sutskever, I., and Hinton, G. E., "ImageNet Classification with Deep Convolutional Neural Networks," Advances in Neural Information Processing Systems, 2012.
  32. Lavin, A. and Gray, S., "Fast Algorithms for Convolutional Neural Networks," arXiv preprint arXiv: 1509.09308, 2015.
  33. Smith, C., Nguyen, C. and De. U., "Distributed Tensor Flow: Scaling Google's Deep Learning Library on Spark," ARIMO, 2016. https://mimo.com/machine-learning/deepleaming/2016/arimo-distributed-tensorflow-on-spark/
  34. Vishnu, A., Siegel, C., and Daily, J., "Distributed TensorFlow with MPI," arXiv preprint arXiv: 1603.02339, 2016.
  35. Multi node caffe. https://github.com/BVLC/caffe/pull/3441
  36. Elephas: Distributed Deep learning with Keras & Spark. https://github.com/maxpumperla/elephas/
  37. IPC. https://github.com/twitter/torch-ipc
  38. DistLearn. https://github.com/twitter/torch-distlearn
  39. Using the GPU - Theano 0.8.2 documentation. http://deeplearning.net/software/theano/tutorial/using_gpu.html
  40. cltorch. https://github.com/hughperkins/cltorch
  41. OpenCL Caffe. https://github.com/BVLC/caffe/tree/opencl
  42. tensorflow-opencl. https://github.com/benoitsteiner/tensorflow-opencl
  43. OpenCL. https://www.khronos.org/opencl/
  44. Song, F. and Dongarra, J. "A Scalable Framework for Heterogeneous GPU-Based Clusters," Proceedings of the twenty-fourth annual ACM symposium on parallelism in algorithms and architectures, pp. 91-100, 2012.
  45. Dean, J. and Ghemawat, S. "Map Reduce: Simplified Data Processing on Large Clusters," Communications of the ACM, vol 51, no. 1, pp.107-113, 2008. https://doi.org/10.1145/1327452.1327492
  46. Petitet, A., Whaley, R. C., Dongarra, J., and Cleary, A., "HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed Memory Computers," 2016. http://www.netlib.org/benchmark/hpl
  47. Kim, J., Jo, G., Jung, J., Kim, J., and Lee, J., "A Distributed OpenCL Framework using Redundant Computation and Data Replication," Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 553-569, 2016.
  48. Kim, J., Seo, S., Lee, J., Nah, J., Jo, G., and Lee, J., "SnuCL: An OpenCL Framework for Heterogeneous CPU/GPU Clusters," Proceedings of the 26th ACM International Conference on Supercomputing, pp. 341-351, 2012.