딥 러닝을 위한 HW 시스템 및 SW 라이브러리 |
Jeong, U-Geun
(서울대학교)
Kim, Jeong-Uk (서울대학교) Park, Jeong-Ho (서울대학교) Park, Ji-Yeong (서울대학교) Sin, Jae-Ho (서울대학교) Jeong, Jae-Hun (서울대학교) Jo, Gang-Won (서울대학교) Kim, Hui-Hun (서울대학교) Nam, Hyeong-Uk (서울대학교) Lee, Jae-Jin (서울대학교) |
1 | Rivera, J., "Gartner Reveals Top Predictions for IT Organizations and Users for 2014 and Beyond," Gartner, 2013. http://www.gartner.com/newsroom/id/2603215 |
2 | Woods, V., "Gartner Identifies the Top 10 Strategic Technology Trends for 2016," Gartner, 2015. http://www.gartner.com/newsroom/id/3143521 |
3 | Google Scholar. https://scholar.google.com/ |
4 | McCulloch, W. S. and Pitts, W., "A logical calculus of the ideas immanent in nervous activity," Bulletin of Mathematical Biophysics, vol. 5, no. 4, pp. 115-133, 1943. DOI |
5 | LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D., "Backpropagation applied to handwritten zip code recognition," Neural Computation, vol. 1, no. 4, pp. 541-551, 1989. DOI |
6 | Birdsall, J. W., "The Sun Hardware Reference," 1995. http://www.sunhelp.org/faq/sunrefl.html |
7 | "NVIDIA Tesla P100," NVIDIA Whitepaper, 2016. https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf |
8 | Min, S., Lee, B., and Yoon, S., "Deep Learning in Bioinformatics," arXiv preprint arXiv:1603.06430, 2016. |
9 | Fehrer, R. and Feuerriegel, S., "Improving Decision Analytics with Deep Learning: The Case of Financial Disclosures," arXiv preprint arXiv:1508.01993, 2015. |
10 | Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T., "Caffe: Convolutional Architecture for Fast Feature Embedding," Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675-678, 2014. |
11 | Abadi, M. et al., "TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems," arXiv preprint arXiv:1603.04467, 2016 |
12 | Bergstra, J., Bastien, F., Breuleux, O., Lamblin, P., Pascanu, R., Delalleau, O., Desjardins, G., Warde-Farley, D., Goodfellow, I., Bergeron, A., and Bengio, Y., "Theano: Deep Learning on GPUs with Python," Journal of Machine Learning Research, vol. 1, pp. 1-48, 2011. |
13 | Torch: A scientific computing framework for LuaJIT. http://torch.ch/ |
14 | ImageNet. http://image-net.org/ |
15 | Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., and Tran, J., "cuDNN: Efficient Primitives for Deep Learning," arXiv preprint arXiv:1410.0759, 2014. |
16 | Mathieu, M., Mikael H., and LeCun, Y., "Fast Training of Convolutional Networks through FFTs," arXiv preprint arXiv:1312.5851, 2013. |
17 | Jouppi, N., "Google supercharges machine learning tasks with TPU custom chip," Google Cloud Platform Blog, 2016. https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html |
18 | Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N., and Temam, O., "DaDianNao: A Machine-Learning Supercomputer," Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609-622, 2014. |
19 | Lacey, G., Taylor, G. W., and Areibi, S., "Deep Learning on FPGAs: Past, Present, and Future," arXiv preprint arXiv: 1602.04283, 2016. |
20 | Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., and Dally, W. J., "EIE: Efficient Inference Engine on Compressed Deep Neural Network," arXiv preprint arXiv:1602.01528, 2016. |
21 | Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S. K., Hernandez-Lobato, J. M., Wei, G.-Y, and Brooks, D., "Minerva: Enabling Low-Power, Highly-Accurate Deep Nerual Network Accelerators," Proceedings of the 43rd International Symposium on Computer Architecture, 2016. |
22 | Tallada, M. G., "Coarse Grain Parallelization of Deep Neural Networks," Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Article no. 1, 2016. |
23 | Using the GPU - Theano 0.8.2 documentation. http://deeplearning.net/software/theano/tutorial/using_gpu.html |
24 | cltorch. https://github.com/hughperkins/cltorch |
25 | OpenCL Caffe. https://github.com/BVLC/caffe/tree/opencl |
26 | Ovtcharov, K., Ruwase, O., Fowers, J., Strauss, K., and Chung, E., "Accelerating Deep Convolutional Neural Networks Using Specialized Hardware," Microsoft Research Whitepaper, 2015. https://www.microsoft.com/en-us/research/publication/accelerating-deep-convolutional-neural-networks-using-specalized-hardware/ |
27 | Dean, J., Corrado, G. S., Monga, R., Chen, K., Devin, M., Le, Q. V., Mao, M. Z., Ranzato, M., Senior, A., Tucker, P., Yang, K., and Ng, A. Y. "Large Scale Distributed Deep Networks," Advances in Neural Information Processing Systems, vol. 25, pp. 1232-1240, 2012. |
28 | Wu, R., Yan, S., Shan, Y., Dang, Q., and Sun, G. "Deep Image: Scaling up Image Recognition," arXiv preprint arXiv: 1501.02876, 2015. |
29 | Adhikari, R., "Google, Movidius to Bring Deep Learning to Mobile Devices," Tech News World, 2016. http://www.technewsworld.com/story/83052.html |
30 | Qualcomm Zeroth Platform. https://www.qualcomm.com/invention/cognitive-technologies/zeroth |
31 | LiKamWa, R., Hou, Y, Gao, J., Polansky, M., and Zhong, L., "RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision," Proceedings of the 43rd International Symposium on Computer Architecture, 2016. |
32 | Caffe tutorial. http://caffe.berkeleyvision.org/tutorial/layers.html |
33 | Krizhevsky, A., Sutskever, I., and Hinton, G. E., "ImageNet Classification with Deep Convolutional Neural Networks," Advances in Neural Information Processing Systems, 2012. |
34 | Lavin, A. and Gray, S., "Fast Algorithms for Convolutional Neural Networks," arXiv preprint arXiv: 1509.09308, 2015. |
35 | Smith, C., Nguyen, C. and De. U., "Distributed Tensor Flow: Scaling Google's Deep Learning Library on Spark," ARIMO, 2016. https://mimo.com/machine-learning/deepleaming/2016/arimo-distributed-tensorflow-on-spark/ |
36 | Vishnu, A., Siegel, C., and Daily, J., "Distributed TensorFlow with MPI," arXiv preprint arXiv: 1603.02339, 2016. |
37 | Multi node caffe. https://github.com/BVLC/caffe/pull/3441 |
38 | "GPU-Based Deep Learning Inference: A Performance and Power Analysis," NVIDIA Whitepaper, 2015. https://www.nvidia.com/content/tegra!embedded-systems/pdf/jetson_tx1_whitepaper.pdf |
39 | Elephas: Distributed Deep learning with Keras & Spark. https://github.com/maxpumperla/elephas/ |
40 | tensorflow-opencl. https://github.com/benoitsteiner/tensorflow-opencl |
41 | OpenCL. https://www.khronos.org/opencl/ |
42 | Song, F. and Dongarra, J. "A Scalable Framework for Heterogeneous GPU-Based Clusters," Proceedings of the twenty-fourth annual ACM symposium on parallelism in algorithms and architectures, pp. 91-100, 2012. |
43 | Dean, J. and Ghemawat, S. "Map Reduce: Simplified Data Processing on Large Clusters," Communications of the ACM, vol 51, no. 1, pp.107-113, 2008. DOI |
44 | Petitet, A., Whaley, R. C., Dongarra, J., and Cleary, A., "HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed Memory Computers," 2016. http://www.netlib.org/benchmark/hpl |
45 | IPC. https://github.com/twitter/torch-ipc |
46 | DistLearn. https://github.com/twitter/torch-distlearn |
47 | Kim, J., Seo, S., Lee, J., Nah, J., Jo, G., and Lee, J., "SnuCL: An OpenCL Framework for Heterogeneous CPU/GPU Clusters," Proceedings of the 26th ACM International Conference on Supercomputing, pp. 341-351, 2012. |
48 | Kim, J., Jo, G., Jung, J., Kim, J., and Lee, J., "A Distributed OpenCL Framework using Redundant Computation and Data Replication," Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 553-569, 2016. |