Matching Pursuit Estimation and Quantizer Design for Sinusoidal Model-based Coder

정현파 모델 부호화기를 위한 MP(Matching Pursuit) 알고리즘과 파라미터 양자화기

  • Published : 2005.10.01

Abstract

In this paper. we propose a coding method using a matching pursuit algorithm in a strongly periodic highband signal. Also. we propose an efficient quantizer for the estimated parameters : spectral magnitude and phase. Based on the error concealment principle and sinusoidal model. the MP algorithm requires the high-precision pitch period estimation. To estimate more accurate pitch period. the refined pitch obtained from lowband speech is used. which increases the efficiency of bit allocation. The spectral magnitude parameters are quantized by the method which is combined with MDCT (Modified Discrete Cosine Transform) and multi-stage structure. The spectral phase quantizer uses the $2{\pi}$ modular characteristic of phases and the weighted function by spectral magnitudes. To evaluate the efficiency of the proposed method. we applied it to analysis-by-synthesis system. Furthermore we suggest the possibillity of scalable wideband speech codecs based on band-split structure.

본 논문에서는 고대역 (4kHz-8kHz)의 주기적 성분이 강하게 나타나는 신호에 대해서 MP (Matching Pursuit) 알고리즘을 이용한 부호화 방법을 제안한다. 또한 분석된 스펙트럼 크기 파라미터와 위상 파라미터의 효율적인 양자화 방법을 제안한다. MP 알고리즘은 오류 상쇄 원리와 정현파 모델에 바탕을 두고 있기 때문에 정확한 피치 주기 예측이 필요하다. 고대역의 정확한 피치 주기 예측을 위해 저대역 (0kHz-4kHz) 신호에서 검출한 피치 주기를 이용함으로써 부호화와 비트할당의 효율을 높일 수 있다. 스펙트럼 크기 계수의 양자화를 위해 계수들에 대해 고정 차원 이산코사인 변환 (MDCT : Modified Discrete Cosine Transform) 및 다단계 (multi-stage) 구조를 결합시킨 양자화 기법을 사용하였고, 위상 값들은 스펙트럼 크기에 따른 가중치 필터와 위상의 $2{\pi}$ 순환 특성을 이용하여 양자화하였다. 또한 제안한 양자화 기법과 부호화 방법을 음성 분석-합성 (analysis-by-synthesis) 시스템에 적용하여, 목적 신호와의 비교를 통해 검증한다. 향후 대역 분할을 기본 구조로 하는 계층 구조의 광대역 음성부호화기에의 적용 가능성을 제시한다.

Keywords

References

  1. ITU- T Recommendation. G.722, '7 kHz audio-coding within 64 kbit/s.' Nov. 1988
  2. ITU- T Recommendation. G. 722.1, 'Coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss,' Sep. 1999
  3. ITU-T Recommendation. G.722.2, 'Wideband coding of speech at around 16 kbit/s using Adaptive MultiRate Wideband:' Jan. 2001
  4. ITU-T SG16 Q.9, 'Report of Q.9/16 meeting:' Nov. 2004
  5. A. McCree, 'A 14 kb/s wideband speech coder with a parametric highband model,' IEEE International Conference on Acoustics, Speech and Signal Processing, 2, 1153-1156, Jun. 2000
  6. K. Koishida, V. Cuperman, A. Gersho, 'A 16-kbit/s bandwidth scalable audio coder based on the G.729 standard:' IEEE International Conference on Acoustics, Speech and Signal Processing, 2, 1149-1152, Jun. 2000
  7. K. T. Kim, S. K. Juno, Y. C. Park, D. H. Youn, 'A new bandwidth scalable wideband speech/audio coder:' IEEE International Conference on Acoustics, Speech and Signal Processing, 1, 657-660, May. 2002
  8. R. McAulay, T, Ouatieri, 'Speech Analysis/Synthesis Based on a Sinusoidal Representation:' IEEE Transactions on Signal Processing, 34, 744-754, Aug. 1986 https://doi.org/10.1109/TASSP.1986.1164910
  9. A. M. Kondoz, Digital Speech(John Wiley & Sons Ltd' New York, 1994), 36-41
  10. O.K. Al-Shavkh, E. Miloslavsky, 'Video compression using matching pursuits:' IEEE Transactions on Circuits and Systems for Video Technology, 9, 123-143, Feb. 1999 https://doi.org/10.1109/76.744280
  11. Yuan Yuan, D. M. Monro, 'Improved Matching Pursuits Image Coding:' IEEE International Conference on Acoustics, Speech and Signal Processing, 2, 201-204, Mar. 2005
  12. K. Skretting, K Engan, J.H. Husoy, 'EOG compression using signal dependent frames and matching pursuit:' IEEE International Conference on Acoustics, Speech and Signal Processing, 4, 585-588, Mar. 2005
  13. P. Vera-Candeas, N. Ruiz-Reves. 'New matching pursuit based sinusoidal modelling method for audio coding,' lEE Proceedings on Vision, Image and Signal Processing. vol. 151, 21-28, Feb. 2004
  14. Lalos Hanzo, F. Clare, A. Somerville and Jason P. Woodard, Voice Compression and communications (The Institute of Electrical and Electronic Engineering, Inc., New York, 2001), 531-564
  15. 송재종, 박호종, 김무영, 김도석, 김정수,'광대역 신호 압축기를 위한 주파수 대역 특성에 선택적인 양자화 방법,' 음향학회지 20 (7), 76-82, 2001
  16. S. G. Mallet, Zhifeng Zhang, 'Matching pursuit with time-frequency dictionaries:' IEEE Transactions on Signal Processing, 41, 3397-3415, Dec. 1993 https://doi.org/10.1109/78.258082
  17. E. B. George, M. J. T. Smith, 'Speech analysis/synthesis and modification using an analysis-bysynthesis/overlap-add sinusoidal model:' IEEE Transcations on Signal Processing, 5, 389-406, Sep, 1997
  18. E. B. George, M. J. T. Smith, Audio analysis/synthesis system, (U.S Patent 5327518, Jul. 1994)
  19. F. A. Bilson, 'On the influence of the number and phase of harmonics on the perceptibility of the pitch of complex signals,' Acoustica, 28, 60-65, Sep, 1973
  20. 김도석, '인지에 중요한 음향신호의 위상에 대해,'음향학회지 19 (7), 28-33, 2000
  21. Alan V. Oppenheim, Ronald W. Schafer, Discretetime Signal Processino-2nd ed(Prentice Hall, New Jersey, 1999), 240 - 339
  22. Peter Lupini. Vladimir Cuperman, 'Nonsquare Transform Vector Quantization:' in IEEE Signal Precessing letters, 3 (1), Jan. 1996