Method of a Multi-mode Low Rate Speech Coder Using a Transient Coding at the Rate of 2.4 kbit/s

전이구간 부호화를 이용한 2.4 kbit/s 다중모드 음성 부호화 방법

  • Ahn Yeong-uk (Department of Radiowave Engineering Chungbuk National University) ;
  • Kim Jong-hak (Department of Radiowave Engineering Chungbuk National University) ;
  • Lee Insung (Department of Radiowave Engineering Chungbuk National University) ;
  • Kwon Oh-ju (Agency for Defense Development) ;
  • Bae Mun-Kwan (Agency for Defense Development)
  • Published : 2005.03.01

Abstract

The low rate speech coders under 4 kbit/s are based on sinusoidal transform coding (STC) or multiband excitation (MBE). Since the harmonic coders are not efficient to reconstruct the transient segments of speech signals such as onsets, offsets, non-periodic signals, etc, the coders do not provide a natural speech quality. This paper proposes method of a efficient transient model :d a multi-mode low rate coder at 2.4 kbit/s that uses harmonic model for the voiced speech, stochastic model for the unvoiced speech and a model using aperiodic pulse location tracking (APPT) for the transient segments, respectively. The APPT utilizes the harmonic model. The proposed method uses different models depending on the characteristics of LPC residual signals. In addition, it can combine synthesized excitation in CELP coding at time domain with that in harmonic coding at frequency domain efficiently. The proposed coder shows a better speech quality than 2.4 kbit/s version of the mixed excitation linear prediction (MELP) coder that is a U.S. Federal Standard for speech coder.

현재 개발된 4 kbit/s이하의 저 전송율 음성부호화 시스템은 STC(Sinusoidal Transform Coding)나 MBE (Multi-band Excitation Coding)에 바탕을 두고 있다. 이러한 저 전송율 부호화기들은 대표적인 전이구간 신호인 유성음의 시작점과 끝점에서의 혼합신호(onset signal, offset signal), 비주기적인 신호(non-period signal) 등은 정확히 표현하지 못하기 때문에 자연스런 음질을 만들어 내지 못한다. 본 논문에서는 유성음에는 하모닉 모델, 무성음에서는 스토케스틱 모델, 전이구간에는 하모닉 기반의 비주기적인 펄스의 위치를 추적하는 방식을 사용하여 효과적으로 전이구간을 모델링 하는 방법과 2.4 kbit/s 다중모드 부호화방법을 제안한다. 제안한 방법은 원본신호에서 선형예측 부호화 방법으로 추출된 잔여신호를 신호의 성격에 따라 모델을 달리하는 방법이며, 자각의 신호의 성격에 따라 좋은 성능을 나타내는 모델을 사용하였다. 또한 효율적인 전이구간 모델링 방법의 도입으로 저 전송율에서 CELP(Code Excitation Linear Predictive) 부호화 방식에 의해 시간축에서 합성되는 여기신호와 선형위상을 이용한 하모닉 부호화 방식에 의해 주파수축에서 합성되는 여기신호를 효율적으로 결합이 가능하다는 것이 제안된 2.4 kbit/s 다중모드 부호화기의 장점이다. 제안된 방법의 2.4kbit/s 다중모드 부호화기는 미국 연방 표준부호화기인 2.4 kbit/s MELP(Mixed Excitation Linear Prediction) 부호화기보다 더 좋은 성능을 나타낸다.

Keywords

References

  1. T. Quatieri, R. McAulay, 'Speech transformations based on a sinusoidal representation.', IEEE Transactions on Signal Processing, Vol. 34, pp. 1449-1464, Dec. 1986 https://doi.org/10.1109/TASSP.1986.1164985
  2. J. C. Hardwick, J. S. Lim, 'A 4.8kbps multi-band excitation speech coder', IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp. 374-377, 1988 https://doi.org/10.1109/ICASSP.1988.196595
  3. A. MeCree, Truong Kwan, E. B. George, T. P. Barnwell, V. Viswanathan, 'A 2.4 kbit/s MELP coder candidate for the new U. S. Federal Standard', IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1 pp. 200-203, May 1996 https://doi.org/10.1109/ICASSP.1996.540325
  4. C. Laflamme, R. Matmti, J. P. Adoul, 'Harmonic-stochastic excitation (HSX) speech coding below 4 kbit/s', IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp. 204-207, May 1996 https://doi.org/10.1109/ICASSP.1996.540326
  5. Chunyan Li, V. Cuperman, 'Enhanced harmonic coding of speech with frequency domain transition modeling ', IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 2, pp. 581-584, May 1998 https://doi.org/10.1109/ICASSP.1998.675331
  6. Lajos Hanzo, C. Somerville, J. Woodard, Voice Compression and Communications, John Wiley & Sons, Inc., Publishers., pp. 531-564, 2001
  7. E. Shlomot, V. Cuperman, A. Gersho, 'Combined harmonic and waveform coding of speech at low bit rates', IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 2, pp. 585-588, May 1998 https://doi.org/10.1109/ICASSP.1998.675332
  8. M. Nishiguchi, J. Matsumoto, 'Harmonic and noise coding of LPC residuals with classified vector quantization', IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp. 484-487, May 1995 https://doi.org/10.1109/ICASSP.1995.479634
  9. L. Yang, T. R. Fischer, S. Kang, I. Lee, 'Codebook optimization in variable-rate CELP coders with sparse codebooks', IEEE Workshop on Speech Coding Proceedings, pp. 85-86, Sep. 1995
  10. A. M. Kondoz, Digital Speech, John Wiley & Sons, Inc., Publishers., pp. 174-209, 1994
  11. W. B. Kleijn, K. K. Paliwal, Speech Coding and Synthesis, Elsevier Science Publishers, pp. 399-431, 1995
  12. M. Nishiguchi, A. Inoue, Y. Maeda, J. Matsumoto, 'Parametric speech coding-HVXC at 2.0-4.0 kbps', IEEE Workshop on Speech Coding Proceedings, pp. 84-86, June 1999 https://doi.org/10.1109/SCFT.1999.781492
  13. DVSI. Inmarsat-M Voice Codec. Issue 3.0 ed., Augst 1991
  14. ITU-T Recommendation P.862, 'Perceptual evaluation of speech quality (PESQ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codec', Feb. 2001