Speech Quality Measure for VoIP Using Wavelet Based Bark Coherence Function

웨이블렛 기반 바크 코히어런스 함수를 이용한 VoIP 음질평가

  • 박상욱 (연세대학교 전기전자공학과 미디어 통신 신호처리 연구실) ;
  • 박영철 (연세대학교 정보기술학부) ;
  • 윤대희 (연세대학교 전기전자공학과 미디어 통신 신호처리 연구실)
  • Published : 2002.04.01

Abstract

The Bark Coherence Function (BCF) defies a coherence function within perceptual domain as a new cognition module, robust to linear distortions due to the analog interface of digital mobile system. Our previous experiments have shown the superiority of BCF over current measures. In this paper, a new BCF suitable for VoIP is developed. The unproved BCF is based on the wavelet series expansion that provides good frequency resolution while keeping good time locality. The proposed Wavelet based Bark Coherence function (WBCF) is robust to variable delay often observed in packet-based telephony such as Voice over Internet Protocol (VoIP). We also show that the refinement of time synchronization after signal decomposition can improve the performance of the WBCF. The regression analysis was performed with VoIP speech data. The correlation coefficients and the standard error of estimates computed using the WBCF showed noticeable improvement over the Perceptual Speech Quality Measure (PSQM) that is recommended by ITU-T.

본 논문은 객관적 음질 평가법으로 웨이블렛 변환을 이용한 향상된 바크 코히어런스 함수 (Wavelet based Bark Coherence Function : WBCF)를 제안한다. 바크 코히어런스 함수 (Bark Coherence Function : BCF)는 심리 음향 영역에서 코히어런스 함수를 정의함으로서 음성 통신 시스템의 아날로그 부분에 의하여 발생할 수 있는 선형 왜곡에 강한 객관적 음질 평가법이다. VoIP (Voice over Internet Protocol)와 같은 패킷 기반의 음성 전달 시스템은 가변 지연등이 발생 될 수 있는데, 이것은 원음과 왜곡음의 정확한 시간축 정렬을 불가능하게 하여 기존의 객관적 음질 평가법의 성능을 저하시킨다. 제안된 WBCF는 고주파 영역에서 시간 분해능이 높으며, 저주파 영역에서 주파수 분해능이 높은 웨이블렛 변환을 사용한 후 BCF를 계산하여 VoIP 시스템에서의 객관적 음질을 평가한다. 주/객관적 음질 평가 실험을 통하여 WBCF가 ITU-T 권고안인 Perceptual Speech Quality Measure (PSQM)에 비하여 높은 성능을 가짐을 확인하였다.

Keywords

References

  1. S. Quackenbush, T. Bamwell and M. Clements, Objective measures of speech quality, Prentice Hall, 1988
  2. Nynek Hermansky, 'Perceptual Linear Pre-dictive (PLP) analysis of speech', J. Acoust. Soc. Am. , vol. 87, PP1738-1752, April 1990 https://doi.org/10.1121/1.399423
  3. Shihua Wang, et al, 'An Objective measures for predicting subjective quality of speech', IEEE J. SeIect. Areas Commun., vol 10, No5, pp819-829, June 1992 https://doi.org/10.1109/49.138987
  4. J. G. Beerends and J. A. Stemrdink, 'A perceptual speech-quality measured based on a psychoacoustic sound representation', J. Audio Eng. Soc., vol 42, No 3., PP115-123, March, 1994
  5. S. Voran, 'Objective estimation of perceived speech quality, Part I : Development of the measuhng normalizing block technique', IEEE Trans. on Speech and Audio Processing, vol. 7, No. 4, PP371-382, July 1999 https://doi.org/10.1109/89.771259
  6. M. Hansen and B. Kollmeier, 'Using a quan-titative psychoacoustical signal representation for objective speech quality measurement', in Proc. IEEE Int. Conf. Acoust., Speech Signal Process, PP 1387-1390, 1997
  7. R. F. Kubichek, 'Mel-cepstral distance measure for objective speech quality assessment', in Proc. IEEE Pacific Rim Conf. Commumcations, Computer, and Signat Processing, PP 125-128 1993
  8. W. Yang, M Dixon, and R. Yantomo, 'A modified bark spectral distortion measure which uses noise masking threshold', in Proc. IEEE Speech Coding Workshop, PP. 55-56, 1997
  9. Markus Hauenstein, 'Application of meddis' inner hair-cell model to the prediction of subjective speech-quality', in Proc. IEEE Int. Conf. Acoust., Speech Signal Process, PP 545-548, 1998
  10. D. S. Kim, O. Ghitza and P. Kroon, 'A computational model for mos prediction', in Proc. IEEE Speech Codine Workshop, pp141-143. 1999
  11. A. Rix and M. Hollier, 'The perceptual analysis measurement system for robust end-to-end speech quality assessment,' in Proc. IEEE Int. Conf. Acoust., Speech Signal Process, pp.1515-1518. 2000
  12. Sang-Wook Park, Seung-Kyun Ryu, Young-Cheol Park, and Dae-Hee Youn, 'A Bark Coherence Function For Perceived Speech Quality Estimation,' Proc. of Intl. Conf.Spoken Language Processing 2000, Vol 2, pp218-221. 2000
  13. ITU-T Rec. P.800, 'Method for subj'ective detennination of transmission quality', 1996
  14. ITU-T Rec. H.323, 'Packet based multimedia system', 1998
  15. R Babbage, I Moffat, A O'Neill and S Sivaraj, 'Iitemet Phone-change the telephony Para-digm?' BT Technol J, Vol. 15 No.2, pp145-157, 1997 https://doi.org/10.1023/A:1018601211460
  16. ITU-T Rec. P.861, 'Objective quality measurement of telephone-band speech codecs', 1998
  17. Martin Vetterli, Jelena Kovacevic, Wavelets and subband codine, Prentice HalI, 1995
  18. Julius S. Bendat and Allan G. Piersol, Engineering Applications of Correlations andSpectral Analysis, John Wiley & Sons, 1980
  19. E. Zwicker and H. Fastl, Psychoacoustics Facts and Models, Springer-Verlag, 1990
  20. N.R. Draper, H. Smith, Applied Regression Analysis, John Wiley & Sons, New York,1981
  21. http://www.antd.nist.gov/itg/nistnet/
  22. 박상욱, 류승균, 박영철, 윤대희, '바크 코히어런스 함수를 이용한 이동 전화 음질 평가,' 한국통신학획 논문지, 제 26권 제4B호 pp437-446,2001