Perceptual and Adaptive Quantization of Line Spectral Frequency Parameters

선 스펙트럼 주파수의 청각 적응 부호화

  • 한우진 (한국과학기술원 전산학과) ;
  • 김은경 (한국과학기술원 전산학과) ;
  • 오영환 (한국과학기술원 전산학과)
  • Published : 2000.11.01

Abstract

Line special frequency (LSF) parameters have been widely used in low bit-rate speech coding due to their efficiency for representing the short-time speech spectrum. In this paper, a new distance measure based on the masking properties of human ear is proposed for quantizing LSF parameters whereas most conventional quantization methods are based on the weighted Euclidean distance measure. The proposed method derives the perceptual distance measure from the definition of noise-to-mask ratio (NMR) which has high correspondence with the actual distortion received in the human ear and uses it for quantizing LSF parameters. In addition, we propose an adaptive bit allocation scheme, which allocates minimal bits to LSF parameters maintaining the perceptual transparency of given speech frame for reducing the average bit-rates. For the performance evaluation, we has shown the ratio of perceptually transparent frames and the corresponding average bit-rates for the conventional and proposed methods. By jointly combining the proposed distance measure and adaptive bit allocation scheme, the proposed system requires only 770 bps for obtaining 95.5% perceptually transparent frames, while the conventional systems produce 89.9% at even 1800 bps.

선 스펙트럼 주파수를 양자화하기 위한 대부분의 방법들이 가중 유클리드 거리에 기반하고 있는 반면, 본 논문에서는 청각 마스킹 효과에 기반한 에러 척도를 사용하여 선 스펙트럼 주파수를 효과적으로 양자화하는 방법을 제안하였다. 제안한 방법에서는 noise-to-mask ratio (NMR)를 선 스펙트럼 주파수의 양자화에 적합하도록 변형한 새로운 에러 척도를 유도하고, 이를 사용하여 선 스펙트럼 주파수를 양자화한다. 한편, 본 논문에서는 양자화하고자 하는 음성 프레임이 갖는 청각적인 특성을 고려하여 동적으로 비트를 할당하는 적응 양자화 알고리즘을 제안하였다. 성능 평가를 위해서 11948 프레임의 테스트 자료를 기존의 방법과 제안한 방법으로 각자 양자화하고 perceptually transparent frame의 비운 및 이때의 평균 비트율을 비교한 결과, 기존의 방법이 1800 bps의 비트율에서 89.9%의 perceptually transparent frame을 얻은 데 비해, 제안한 방법은 770 bps의 평균 비트율에서 95.5%의 perceptually transparent frame을 얻음으로써 제안한 방법이 효과적임을 보였다.

Keywords

References

  1. Advances in Speech Signal Processing Predictive coding of speech using analysis-by-synthesis techniques P. Kroon;B. S. Atal;S. Furui(ed.);M. M. Sondhi(ed.)
  2. IEEE Trans. Acoust., Speech, Signal Processing v.ASSP-23 Quantization properties of transmission parameters in linear predictive systems R. Viswanathan;J. Markhoul
  3. IEEE Trans. Acoust., Speech, Signal Processing v.ASSP-24 Quantization and bit allocation in speech processing A. H. Gray, Jr.;J. D. Markel
  4. J. Acoust. Soc. Amer. v.57 Line spectrum representation of linear predictive coefficients of speech signals F. Itakura
  5. IEEE Trans. Speech, Audio Processing v.1 no.1 Efficient Vector Quantization of LPC Parameters at 24 Bits/Frame Kuldip K. Paliwal;Bishnu S. Atal
  6. IEEE Trans. Speech, Audio Processing v.1 no.1 Optimal Quantization of LSP Parameters Frank K. Soong
  7. Proc. of ICASSP Robust and Efficient Quantization of Speech LSP Parameters Using Strcutured Vector Quantizers Rajiv Laroia;Nam Phamdo;Nariman Farvardin
  8. Digital Speech: Coding for Low Bit Rate Communication Systems A. M. Kondoz
  9. IEEE J. Select. Areas Commun. v.6 Transform coding of audio signals using perceptual noise criteria J. D. Johnston
  10. J. Acoust. Soc. Am. v.66 no.6 Optimizing digital speech coders by exploiting masking properties of the human ear M. Schroeder
  11. Digital Signal Processing Proceedings v.1 A review of algorithms for perceptual coding of digital audio signals T. Painter;A. Spanias
  12. Audio Engineering Society Test and Measurement Conference NMR and Masking Flag:Evaluation of Quality Using Perceptual Criteria K. Brandenburg;T. Sporer
  13. IEEE Transactions on Speech and Audio Processing v.2 no.1 Auditory Models and Human Performance in Tasks Related to Speech Coding and Speech Recognition Oded Ghitza
  14. IEEE Transactions on Speech and Audio Processing v.2 no.3 Self-Normalization and Noise-Robustness in Early Auditory Representation Kuansan Wang;Shihab Shamma
  15. Proc. of IEEE v.81 no.10 Signal Compression Based on Models of Human Perception N. Jayant;J. Johnston;R. Safranek
  16. ISO/IEC JTC1/SC29/WG11 MPEG IS11172-3 Information Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5 Mbit/s, Part 3: Audio ISO/IEC
  17. Proc. of ICASSP OCF - A New Coding Algorithm for High Quality Sound Signals K. Brandenburg
  18. Proc. of ICASSP A MUSICAM Source Codec for Digital Audio Broadcasting and Storage Y. F. Dehery
  19. Proc. of 90th conv. Aud. Eng. Soc. ASPEC: Adaptive Spectral Entropy Coding of High Quality Music Signals K. Brandenburg