DOI QR코드

DOI QR Code

Nonlinear optimization algorithm using monotonically increasing quantization resolution

  • Jinwuk Seok (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Jeong-Si Kim (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute)
  • Received : 2021.09.07
  • Accepted : 2022.03.29
  • Published : 2023.02.20

Abstract

We propose a quantized gradient search algorithm that can achieve global optimization by monotonically reducing the quantization step with respect to time when quantization is composed of integer or fixed-point fractional values applied to an optimization algorithm. According to the white noise hypothesis states, a quantization step is sufficiently small and the quantization is well defined, the round-off error caused by quantization can be regarded as a random variable with identically independent distribution. Thus, we rewrite the searching equation based on a gradient descent as a stochastic differential equation and obtain the monotonically decreasing rate of the quantization step, enabling the global optimization by stochastic analysis for deriving an objective function. Consequently, when the search equation is quantized by a monotonically decreasing quantization step, which suitably reduces the round-off error, we can derive the searching algorithm evolving from an optimization algorithm. Numerical simulations indicate that due to the property of quantization-based global optimization, the proposed algorithm shows better optimization performance on a search space to each iteration than the conventional algorithm with a higher success rate and fewer iterations.

Keywords

Acknowledgement

This work was supported by Institute for Information and Communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (2017-0-00142, Development of Acceleration SW Platform Technology for On-device Intelligent Information Processing in Smart Devices, and 2021-0-00766, Development of Integrated Development Framework that supports Automatic Neural Network Generation and Deployment optimized for Runtime Environment).

References

  1. S. Garg, Embedded systems market-global forecast to 2023, MarketsandMarkets, 2017.
  2. W. Han Yun, D. Kim, H.-S. Yoon, and J. Lee, Disguised-face discriminator for embedded systems, ETRI J. 32 (2010), no. 5, 761-765. https://doi.org/10.4218/etrij.10.1510.0139
  3. Y. C. Yoon, S. Y. Park, S. M. Park, and H. Lim, Image classification and captioning model considering a cam-based disagreement loss, ETRI J. 42 (2019), no. 1, 67-77.
  4. C. M. De Sa, C. Zhang, K. Olukotun, C. Re, and C. Re, Taming the wild: A unified analysis of hogwild-style algorithms, Advances in neural information processing systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, (eds.), Curran Associates, Inc., 2015, pp. 2674-2682.
  5. F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, 1-bit stochastic gradient descent and application to data-parallel distributed training of speech DNNS, (INTERSPEECH, Singapore), Sept. 2014, pp. 1058-1062.
  6. N. Strom, Scalable distributed dnn training using commodity GPU cloud computing, (Sixteenth Annual Conference of the International Speech Communication Association), 2015, pp. 1488-1492.
  7. D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, Qsgd: Communication-efficient SGD via gradient quantization and encoding, Advances in neural information processing systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, (eds.), Curran Associates, Inc., 2017, pp. 1709-1720.
  8. A. C. Wilson, R. Roelofs, M. Stern, N. Srebro, and B. Recht, The marginal value of adaptive gradient methods in machine learning, Advances in neural information processing systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, (eds.), Curran Associates, Inc., 2017, pp. 4148-4158.
  9. S. J. Osher, B. Wang, P. Yin, X. Luo, M. Pham, and A. T. Lin, Laplacian smoothing gradient descent, arXiv preprint, 2018. https://doi.org/10.48550/arXiv.1806.06317
  10. K. Bae, H. Ryu, and H. Shin, Does adam optimizer keep close to the optimal point? arXiv preprint, 2019. https://doi.org/10.48550/arXiv.1911.00289
  11. D. Jimenez, L. Wang, and Y. Wang, White noise hypothesis for uniform quantization errors, SIAM J. Math. Anal. 38 (2007), no. 6, 2042-2056. https://doi.org/10.1137/050636929
  12. C.-R. Hwang, Laplace's method revisited: weak convergence of probability measures, Ann. Probab. 8 (1980), no. 6, 1177-1182.
  13. T.-S. Chiang, C.-R. Hwang, and S. J. Sheu, Diffusion for global optimization in Rn, SIAM J. Control Optim. 25 (1987), no. 3, 737-753. https://doi.org/10.1137/0325042
  14. S. Geman and C.-R. Hwang, Diffusions for global optimization, SIAM J. Control Optim. 24 (1986), no. 5, 1031-1043. https://doi.org/10.1137/0324060
  15. F. C. Klebaner, Introduction to stochastic calculus with applications, Introduction to Stochastic Calculus with Applications, Imperial College Press, 2005.
  16. B. Oksendal, Stochastic differential equations: An introduction with applications, Universitext, Springer Berlin Heidelberg, 2013.