DOI QR코드

DOI QR Code

양이형 음성 음질개선 시스템을 위한 온라인 잡음 상관도 추정 알고리즘

On-line noise coherence estimation algorithm for binaural speech enhancement system

  • 지유나 (연세대학교 컴퓨터정보통신공학부) ;
  • 백용현 (연세대학교 컴퓨터정보통신공학부) ;
  • 박영철 (연세대학교 컴퓨터정보통신공학부)
  • 투고 : 2016.03.03
  • 심사 : 2016.04.19
  • 발행 : 2016.05.31

초록

본 논문에서는 양이형 음성 음질개선 시스템에 적용 가능한 잡음 상관도 온라인 추정 알고리즘을 제안한다. 양이형 시스템에서 공간 상관도(spatial coherence) 정보를 이용해 잡음의 파워 스펙트럼을 추정하거나 음질 개선 이득을 형성하는 기술들이 다수 연구되어 왔다. 이때 잡음 상관도는 통상적으로 수학적으로 모델링된 실수의 고정 값을 사용하여왔다. 하지만 실생활에서 접하게 되는 잡음의 상관도는 음향 환경에 따라 변화하는 특성을 가지게 되며 이때 발생하는 오차는 음질 개선 알고리즘의 정확도를 떨어뜨리는 원인이 된다. 따라서 본 논문에서는 변화하는 잡음의 상관도를 온라인으로 업데이트하여 정확한 잡음 상관도를 추정함으로써 양이형 음질 개선 알고리즘의 성능을 향상 시키고자 하였다. 잡음의 상관도는 음성 부재 구간에서 업데이트 될 수 있으며 실험 결과 제안 알고리즘이 기존의 수학적 모델에 비해 음질 개선 알고리즘의 성능을 향상시킴을 볼 수 있다.

In this paper, an on-line noise coherence estimation algorithm for binaural speech enhancement system is proposed. A number of noise Power Spectral Density (PSD) estimation algorithms based on the noise coherence between two microphones have been proposed to improve the speech enhancement performance. In the conventional algorithms, the noise coherence was characterized using a real-valued analytic model. However, unlike the analytic model, the noise coherence between the two microphones is time-varying in real environments. Thus, in this paper, the noise coherence is updated in accordance with the variation of the acoustic environment to track the realistic noise coherence. The noise coherence can be updated only during the absence of speech, and the simulation results demonstrate the superiority of the proposed algorithm over the conventional algorithms based on the analytic model.

키워드

참고문헌

  1. R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics," IEEE Trans. Speech Audio Process. 9, 504-512 (2001). https://doi.org/10.1109/89.928915
  2. Y. H. Son, J. H. Choi and J. H. Chang, "Improved minimum statistics based on environment -awareness for noise power estimation" (in Korean), J. Acoust. Soc. Kr. 30, 123-128 (2011). https://doi.org/10.7776/ASK.2011.30.3.123
  3. S. J. Lee and S. H. Kim, "Adaptive Threshold for Speech Enhancement in Nonstationary Noisy Environments" (in Korean), J. Acoust. Soc. Kr. 27, 386-393 (2008).
  4. S. Rangachari and P. C. Loizou, "A noise-estimation algorithm for highly non-stationary environments," Speech communication, 48, 220-231 (2006). https://doi.org/10.1016/j.specom.2005.08.005
  5. L. Wang, T. Gerkmann and S. Doclo. "Noise power spectral density estimation using MaxNSR blocking matrix," IEEE/ACM Trans. Audio, Speech, Lang. Process. 23, 1493-1508 (2015). https://doi.org/10.1109/TASLP.2015.2438542
  6. H. Abutalebi, H. Sheikhzadeh and L. Brennan, "A hybrid subband adaptive system for speech enhancement in diffuse noise fields," IEEE Signal Process. Lett. 11, 44-47, (2004). https://doi.org/10.1109/LSP.2003.819348
  7. B. N. Laska, M. Bolic and R. A. Goubran, "Coherence-assisted Wiener filter binaural speech enhancement," IEEE, Instrumentation and Measurement Technology Conference, 876-881, (2010).
  8. I. A. McCowan and H. Bourlard, "Microphone array post-filter based on noise field coherence," IEEE Trans. on. Speech, Audio Process. 11, 709-716 (2003). https://doi.org/10.1109/TSA.2003.818212
  9. A. H. Kamkar-Parsi and M. Bouchard, "Improved noise power spectrum density estimation for binaural hearing aids operating in a diffuse noise field environment," IEEE Trans. on Audio, Speech, and Lang. Process. 17, 521-533 (2009). https://doi.org/10.1109/TASL.2008.2009017
  10. M. Jeub, C. Nelke, H. Kruger, C. Beaugeant and P. Vary, "Robust dual-channel noise power spectral density estimation," Signal Processing Conference, 2011 19th European. IEEE, 2304-2308 (2011).
  11. Y. Ji, Y. C. Park, D. W. Kim, and J. Shon, "Robust noise PSD estimation for binaural hearing aids in time-varying diffuse noise field," in IEEE ICASSP, 7264-7268 (2013).
  12. I. Lindevald and A. Benade, "Two-ear correlation in the statistical sound fields of rooms," J. Acoust. Soc. Am. 80, 661-664 (1986). https://doi.org/10.1121/1.394061
  13. M. Jeub and P. Vary, "Binaural dereverberation based on a dual-channel Wiener filter with optimized noise field coherence." in IEEE ICASSP, 4710-4713 (2010).
  14. A. V. Ralph, A. Carlos and D. Richard O, "Elevation localization and head-related transfer function analysis at low frequencies," J. Acoust. Soc. Am. 109, 1110-1122 (2001). https://doi.org/10.1121/1.1349185
  15. I. Cohen, "Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator," IEEE Signal Processing Letters. 9, 113-116 (2002). https://doi.org/10.1109/97.1001645
  16. W. G. Gardner and K. Martin, HRTF measurements of a KEMAR dummy-head microphone (Technical Report 280, MIT Media Lab Perceptual Computing, 1994).
  17. Y. Hu and P. C. Loizou, "Evaluation of objective quality measures for speech enhancement," IEEE Trans. Audio Speech and Lang. Process. 16, 229-238 (2008). https://doi.org/10.1109/TASL.2007.911054