1 |
R. M. Gray, and D. L. Neuhoff, "Quantization.", IEEE Transactions on Inform. Theory, Vol. 44, No. 6, pp. 2325-2383, June, 2006.
|
2 |
D. Alistarh, D. Grubic, L. J. Ryota, and V. Milan, "QSGD: Communication-efficient sgd via gradient quantization and encoding.", Advances in Neural Information Processing Systems, Vol. 30, pp. 1709-1720, January, 2017
|
3 |
G. Tenenbaum, 'Introduction to Analytic and Probabilistic Number Theory', Academic mathematical Society, 2014.
|
4 |
D. G. Luenberger, Y. Ye, 'Linear and Nonlinear Programming', Springer, 2015.
|
5 |
S. Sra, S. Nowozin, S.J.Wright, 'Optimization for Machine Learning', MIT press, 2012.
|
6 |
J. Duchi, E. Hazan, and Y. Singer. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. The Journal of Machine Learning Research, 2011.
|
7 |
M. Zeiler, "ADADELTA: an adaptive learning rate", arXiv preprint, https://arxiv.org/abs/1212.5701, arXiv:1212.5701, 2012.
|
8 |
D. Kingma, J. Ba. Adam: A Method for Stochastic Optimization. International Conference for Learning Representations, 2015.
|
9 |
S. M. Goldfeld, R. E. Quandt, and H. F. Trotter, "Maximization by Quadratic Hill-Climbing", Econometrica, pp. 541-551, July, 1966.
|
10 |
S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, 2004
|
11 |
M.S. Bazaraa, H.D. Sherali, C.M. Shetty, Nonlinear Programming: Theory and Algorithms. Wiley-Interscience, New Jersey, 2006
|