1 |
P. Sun, Y. Wen, R. Han, W. Feng, and S. Yan, "GradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN Training," IEEE Transactions on Big Data, 2019.
|
2 |
Baidu Research, Ring all-reduce, [Internet] https://github.com/baidu-research/baidu-allreduce.
|
3 |
R. Thakur, R. Rabenseifner, and, W. Gropp, "Optimization of collective communication operations in MPICH," The International Journal of High Performance Computing Applications, Vol.19, No.1, pp.49-66, 2005.
DOI
|
4 |
H. Mikami, H. Suganuma, Y. Tanaka, and Y. Kageyama, "ImageNet/ResNet-50 Training in 224 Seconds," arXiv preprint arXiv:1811.05233, 2018.
|
5 |
A. Sergeev and M. Del Balso, "Horovod: Fast and easy distributed deep learning in TensorFlow," arXiv preprint arXiv:1802.05799, 2018.
|
6 |
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp.770-778, 2016.
|
7 |
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, pp1-9, 2015.
|
8 |
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, pp.4700-4708, 2017.
|
9 |
S. Teerapittayanon, B. McDanel and H. T. Kung, "Distributed deep neural networks over the cloud, the edge and end devices," in IEEE International Conference on Distributed Computing Systems, Atlanta, pp.328-339, 2017.
|
10 |
Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, "Xlnet: Generalized autoregressive pretraining for language understanding," in Advances in Neural Information Processing Systems, pp.5753-5763, 2019.
|
11 |
X. W. Chen and X. Lin, "Big data deep learning: Challenges and perspectives," IEEE Access, Vol. 2, pp.514-524, 2014.
DOI
|
12 |
M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya, R. Wald, and E. Muharemagic, "Deep learning applications and challenges in big data analytics," Journal of Big Data, Vol.2, No.1, 2015.
|
13 |
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, and M. Kudlur, "Tensorflow: A system for large-scale machine learning," in USENIX Symposium on Operating Systems Design and Implementation, Savannah, pp.265-283, 2016.
|
14 |
R. Collobert, K. Kavukcuoglu and C. Farabet, "Torch7: A matlab-like environment for machine learning," in BigLearn, NIPS Workshop, No. CONF, 2011.
|
15 |
F. Seide and A. Agarwal, "CNTK: Microsoft's open-source deep-learning toolkit," in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.2135-2135, 2016.
|
16 |
NVIDIA Developer, NCCL, [Internet] https://developer.nvidia.com/nccl.
|
17 |
Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, Vol.521, pp.436-444, 2015.
DOI
|
18 |
X. Jia, S. Song, W. He, Y. Wang, H. Rong, F. Zhou, L. Xie, Z. Guo, Y. Yang, L. Yu, G. Hu, S. Shi, X. Chu, and T. Chen, "Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes," arXiv preprint arXiv:1807.11205, 2018.
|
19 |
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," in Proceedings of the ACM International Conference on Multimedia, pp.675-678, 2014.
|
20 |
The Software in the Public Interest non-profit organization, Open MPI, [Internet] https://www.open-mpi.org/.
|
21 |
Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, "Deep gradient compression: Reducing the communication bandwidth for distributed training," arXiv preprint arXiv: 1712.01887, 2017.
|
22 |
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Communications of the ACM, Vol.60, No.6, pp.84-90, 2017.
DOI
|
23 |
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
|