Trade-off between Model Complexity and Performance in Intra-frame Predictive Vector Quantization of Wideband Speech

광대역 음성에 대한 프레임내 잔차 벡터 양자화에 있어서 모델 복잡도와 성능 사이의 교환관계

  • Received : 2010.01.29
  • Accepted : 2010.02.01
  • Published : 2010.02.26

Abstract

This paper addresses a design issue of "model complexity and performance trade-off" in the application of bandwidth extension (BWE) methods to the intra-frame predictivevector quantization problem of wideband speech. It discusses model-based linear and non-linear prediction methods and presents a comparative study of them in terms of prediction gain. Through experimentation, the general trend of saturation in performance (with the increase in model complexity) is observed. However, specifically, it is also observed that there is no significant difference between HMM and GMM-based BWE functions.

Keywords

References

  1. M. Nilsson, H. Gustafsson, S. Andersen, and W. Kleijn, "Gaussian mixture model based mutual information between frequency bands in speech," ICASSP, Vol.1, pp.525-528, May 2002.
  2. Y. Agiomyrgiannakis and Y. Stylianou, "Conditional vector quantization for speech coding," IEEE Trans. Audio, Speech, Lang. Process., Vol.15, No.2, pp.377-386, Feb. 2007. https://doi.org/10.1109/TASL.2006.881702
  3. B. Geiser and P. Vary, "Backwards compatible wideband telephony in mobile networks: CELP watermarking and bandwidth extension," ICASSP, Vol.4, pp.533-536, April 2007.
  4. P. Jax, "Bandwidth extension for speech," in Audio Bandwidth Extension, E. Larsen and R. M. Aarts (Ed.), NY:John Wiley & Sons, Nov. 2004, Chap. 6, pp.171-235.
  5. K. -Y. Park, H.S. Kim, "Narrowband to wideband conversion of speech using GMM based transformation," ICASSP, Vol.3, pp.1843-1846, June 2000.
  6. G. -B. Song and P. Martynovich, "A Study of HMM-based bandwidth extension of speech signals," Signal Processing, Vol.89, No.10, pp.2036-2044, Oct. 2009. https://doi.org/10.1016/j.sigpro.2009.03.037
  7. Linde Y, Buzo A, and Gray RM, "An algorithm for vector quantizer design," IEEE Trans. Comm. Vol.28, No.1, pp.84-95, 1980. https://doi.org/10.1109/TCOM.1980.1094577
  8. P. Jax, and P. Vary, "Artificial bandwidth extension of speech signals using MMSE estimation based on a hidden Markov model," ICASSP, Vol.1, pp.680-683, April 2003.
  9. J. S. Garofolo, L. F. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, "DARPA TIMIT Acoustic- Phonetic Continuous Speech Corpus CD-ROM," NIST, 1990.