DOI QR코드

DOI QR Code

Efficient TTS Database Compression Based on AMR-WB Speech Coder

AMR-WB 음성 부호화기를 이용한 TTS 데이터베이스의 효율적인 압축 기법

  • 임종욱 (세종대학교 정보통신공학과) ;
  • 김기출 (세종대학교 정보통신공학과) ;
  • 김경선 ((주)에이치씨아이랩) ;
  • 이항섭 (세종대학교 정보통신공학과) ;
  • 박혜영 ((주)에이치씨아이랩) ;
  • 김무영 (세종대학교 정보통신공학과)
  • Published : 2009.04.30

Abstract

This paper presents an improved adaptive multi-rate wideband (AMR-WB) algorithm for the efficient Text-To-Speech (TTS) database compression. The proposed algorithm includes unnecessary common bit-stream (CBS) removal and parameter delta coding combined with speaker-dependent huffman coding to reduce the required bit-rate without any quality degradation. We also propose lossy coding schemes to produce the maximum bit-rate reduction with negligible quality degradation. The proposed lossless algorithm including CBS removal can reduce bit-rate by 12.40% without quality degradation compared with the 12.65 kbps AMR-WB mode. The proposed lossy algorithm can reduce bit-rate by 20.00% with 0.12 PESQ degradation.

본 논문에서는 효율적으로 Text-To-Speech (TTS) 데이터베이스를 압축하기 위해서 개선된 adaptive multi-rate wideband (AMR-WB) 음성 부호화 알고리즘을 제안하고자 한다. 제안된 알고리즘은 불필요한 common bit-stream (CBS)을 제거하고, 파라미터의 델타 코딩 방식과 특정 화자에 종속적인 Huffman coding을 접목하여 음질 저하 없이 비트율을 낮추고자 하였다. 또한, 최소한의 음질 손실로 최대의 비트율 개선 효과를 볼 수 있는 손실 압축 방식도 제안하였다. 기존의 12.65 kbit/s AMR-WB 코덱에 CBS 제거를 포함한 무손실 압축 방식을 적용한 결과 음질 저하 없이 최대 12.40%의 비트율 개선 효과를 나타냈다. 또한, 손실 압축방식에서는 20.00% 비트율 개선 시 PBSQ로 0.12 정도의 음질 저하가 발생했다.

Keywords

References

  1. C.-H. Lee, S.-K. Jung, and H.-G. Kang, “Applying a Speaker-Dependent Speech Compression Technique to Concate-native TTS Synthesizers,” IEEE Trans. Audio Speech Language Processing, vol. 15, no. 2, pp. 632-640, 2007 https://doi.org/10.1109/TASL.2006.876762
  2. 양희식, 한민수, "TTS DB 압축을 위한 광대역 파형보간 부호기 구현," 대한음성학회지, 말소리 55호, 143-158쪽, 2005
  3. ISO/IEC JTC1/SC29/WG11 No.71, Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5Mbit/s: Part 3-Audio, 1993
  4. O. Derrien, P. DuhameI, M. Charbit, and G. Richard, "A New Quantization Optimization Algorithm for the MPEG Advanced Audio Coder using a Statistical Subband Model of the Quan-tization Noise," IEEE Trans. Audio Speech Language Pro-cessing, vol. 14, no. 4, pp. 1328-1339, 1998 https://doi.org/10.1109/TSA.2005.858041
  5. ITU-T Recommendation G.729, Coding of Speech at 8kbit/s using Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP)
  6. R. Salami, C. Laflamme, J. P. Adoul, A. Kataoka, S. Hayashi, T. Moriya, C. Lamblin, U. Massaloux, S. Proust, P. Kroon, and Y. Shoham, "Design and Description ot CS-ACELP: A Toll Quality 8kb/s Speech Coder," IEEE Trans. Soeech Audio Processing, vol. 6, no. 2, pp. 116-130, 1998 https://doi.org/10.1109/89.661471
  7. B. Bessette, R. Salami, R. Lefebvre, M. Jelinek, J. Rotola-Pukkila, J. Vainio, H. Mikkoa, and K.Jarvinen, “The Adaptive Multirate wideband Speed Codec(AMR-WB),” IEEE Trans. Speech Audio Processing, vol. 10, no, 8, pp. 620-636, 2002 https://doi.org/10.1109/TSA.2002.804299
  8. 3GPP TS 26.190, Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions, v.7.0.0., 2007
  9. 3GPP TS 26.201, Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Frame structure, v.7.1.0., 2008
  10. I. Singh, P. Agathoklis, and A. Antoniou, "Wavelet-based Compression of Speech Signals on the TMS320C30 Digital Signal Processor," in Proc. IEEE Symposium on Advances in Digital Filtering Signal Processing, pp. 178-182, 1998 https://doi.org/10.1109/ADFSP.1998.685720
  11. ITU-T Recommendation G.722.1, Coding at 24 and 32 kbit/s for Hands-Free Operation in System with Low Frame Loss
  12. X. Minjie, D. Lindbergh, and P. Chu, "ITU-T G.722.1 Annex c :A New Low-Complexity 14 kHz Audio Coding Standard," in Proc. IEEE Conf. Acoust., Speech, Signal Processing, pp. 173-176, 2006 https://doi.org/10.1109/ICASSP.2006.1661240
  13. Y. Shoham, "Variable-size vector entropy coding of speech and audio," in Proc. IEEE Conf. Acoust., Speech, Signal Pro-cessing, vol.2, pp.769-772, 2001 https://doi.org/10.1109/ICASSP.2001.941028
  14. W. B. Kleijn, A Basis for Source Coding: Course Notes. KTH, Stockholm, 2008