DOI QR코드

DOI QR Code

A Post-processing for Binary Mask Estimation Toward Improving Speech Intelligibility in Noise

잡음환경 음성명료도 향상을 위한 이진 마스크 추정 후처리 알고리즘

  • Kim, Gibak (School of Electrical Engineering, Soongsil University)
  • Received : 2013.01.29
  • Accepted : 2013.02.25
  • Published : 2013.03.30

Abstract

This paper deals with a noise reduction algorithm which uses the binary masking in the time-frequency domain. To improve speech intelligibility in noise, noise-masked speech is decomposed into time-frequency units and mask "0" is assigned to masker-dominant region removing time-frequency units where noise is dominant compared to speech. In the previous research, Gaussian mixture models were used to classify the speech-dominant region and noise-dominant region which correspond to mask "1" and mask "0", respectively. In each frequency band, data were collected and trained to build the Gaussian mixture models and detection procedure is performed to the test data where each time-frequency unit belongs to speech-dominant region or noise-dominant region. In this paper, we consider the correlation of masks in the frequency domain and propose a post-processing method which exploits the Viterbi algorithm.

시간-주파수 영역에서의 이진 마스킹을 이용하여 잡음환경에서 잡음을 제거하여 음질을 향상하는 방법에 대해 논하고자 한다. 잡음이 섞여 있는 음성신호를 시간-주파수 영역으로 분해하여, 상대적으로 잡음이 많이 섞여 있는 시간-주파수 영역 (시간-주파수 유닛의 신호 대 잡음 비 (Signal-to-Noise Ratio: SNR)가 낮은 영역)의 신호에 마스크 "0"을 할당하여 제거함으로써 음성명료도를 향상시킬 수 있다. 이전의 연구에서는 가우시안 혼합 모델을 이용하여 마스크 "0"과 마스크 "1"을 분류하는 방법을 사용하였다. 각 주파수 밴드별로 수집된 데이터를 이용하여 가우시안 혼합 모델을 학습하고 테스트 데이터가 들어오면 현재의 시간-주파수 마스크가 "0"인지 "1"인지 판별하게 된다. 본 논문에서는 이러한 알고리즘에 주파수 영역에서의 종속성을 고려하여 추정된 마스크에 대해 후처리를 수행하는 알고리즘을 제안한다. 주파수 영역에서의 종속성에 관한 후처리는 비터비 (Viterbi) 알고리즘을 이용하며, 제안된 후처리 알고리즘을 적용하여 이진 마스크 추정 오차를 줄여 음성 명료도 향상을 기대할 수 있다.

Keywords

References

  1. J. S. Lim and A. V. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proceedings of the IEEE, vol. 67, no. 12, pp. 1586-1604, 1979.
  2. S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans.on Acoustics, Speech, and Signal Processing, vol. ASSP-27, no. 2, pp. 113-120, 1979.
  3. Y. Ephraim and D. Malah, "Speech enhancement using a minimum- mean square error short-time spectral amplitude estimator," IEEE Trans.on Acoustics, Speech, and Signal Processing, vol. ASSP-32, no. 6, pp. 1109-1121, 1984.
  4. Y. Ephraim and H. Van Trees, "A signal subspace approach for speech enhancement," IEEE Trans.on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, 1995. https://doi.org/10.1109/89.397090
  5. J. Huang and Y. Zhao, "An energy-constrained signal subspace method for speech enhancement and recognition in white and colored noises," Speech Communication, vol. 26, no. 3, pp. 165-181, Nov. 1998. https://doi.org/10.1016/S0167-6393(98)00041-7
  6. K. Hermus, P. Wambacq, and H. Hamme, "A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition," EURASIP Journal on Advances in Signal Processing, vol. 2007, no. 1, p. 045821, 2007. https://doi.org/10.1155/2007/45821
  7. M. Brandstein and D. Ward (Eds.), Microphone Arrays, Springer-Verlag, 2001.
  8. Y. Hu and P. C. Loizou, "Subjective comparison and evaluation of speech enhancement algorithms," Speech communication, vol. 49, no. 7, pp. 588-601, Jul. 2007. https://doi.org/10.1016/j.specom.2006.12.006
  9. Y. Hu and P. Loizou, "Evaluation of objective quality measures for speech enhancement," IEEE Trans. on Speech and Audio Processing, vol. 16, no. 1, pp. 229-238, 2008. https://doi.org/10.1109/TASL.2007.911054
  10. Y. Hu and P. C. Loizou, "A comparative intelligibility study of single- microphone noise reduction algorithms." J. Acoust. Soc. Am., vol. 122, no. 3, p. 1777, Sep. 2007. https://doi.org/10.1121/1.2766778
  11. G. Kim, Y. Lu, Y. Hu and P. C. Loizou, "An algorithm that improves speech intelligibility in noise for normal-hearing listeners," J. Acoust. Soc. Am., vol. 126, no. 3, pp. 1486-1494, September 2009. https://doi.org/10.1121/1.3184603
  12. G. Kim and P. C. Loizou, "Improving speech intelligibility in noise using environment-optimized algorithms," IEEE trans. Audio, Speech and Language Processing, vol. 18, no. 8, pp. 2080-2090, November 2010. https://doi.org/10.1109/TASL.2010.2041116
  13. G. Brown and M. Cooke, "Computational auditory scene analysis," Computer speech and language, vol. 8, pp. 297-336, 1994. https://doi.org/10.1006/csla.1994.1016
  14. D. Wang and G. Brown, Computational Auditory Scene Analysis : Principles, Algorithms, and Applications, Wiley, Hoboken, NJ, 2006.
  15. A. Viterbi, "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Trans. Information Theoory, vol. 13, Issue 2, pp. 260-269, April 1967. https://doi.org/10.1109/TIT.1967.1054010
  16. G. Forney, Jr., "The Viterbi algorithm," Proceedings of the IEEE, vol. 61, Issue 3, pp. 268-278, March 1973.
  17. J. Tchorz and B. Kollmeier, "Estimation of the signal-to-noise ratio with amplitude modulation spectrograms," Speech Communication, vol. 38, no. 1-2, pp. 1-17, Sep. 2002. https://doi.org/10.1016/S0167-6393(01)00040-1
  18. J. Tchorz and B. Kollmeier, "SNR estimation based on amplitude modulation analysis with applications to noise suppression," IEEE Trans. on Speech and Audio Processing, vol. 11, no. 3, pp. 184-192, May 2003. https://doi.org/10.1109/TSA.2003.811542
  19. M. Kleinschmidt and V. Hohmann, "Sub-band SNR estimation using auditory feature processing," Speech Communication, vol. 39, no. 1- 2, pp. 47-63, Jan. 2003. https://doi.org/10.1016/S0167-6393(02)00058-4
  20. S. Stevens, J. Volkmann, and E. Newman, "A scale for the measurement of the psychological magnitude pitch," J. Acoust. Soc. Am., vol. 8, no. 3, pp. 185-190, 1937. https://doi.org/10.1121/1.1915893
  21. B. Zhang and S. N. Srihari, "Binary vector dissimilarity measure for handwriting identification," Proceeding of SPIE, pp. 155-166, 2003.
  22. IEEE, "IEEE recommended practice for speech quality measurements"," IEEE Trans. Audio Electroacoust., vol. 17, pp. 225-246, 1969. https://doi.org/10.1109/TAU.1969.1162058
  23. A. Varga and H. J. M. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A databased and an experiment to study the effect of additive noise on speech recognition systems," Speech Communication, vol. 12, pp. 247-251, 1993. https://doi.org/10.1016/0167-6393(93)90095-3
  24. N. Li and P. C. Loizou, "Factors influencing intelligibility of ideaal binary- masked speech: Implications for noise reduction," J. Acoust. Soc. Am., vol. 123, no. 3, pp. 1673-1682, March 2008. https://doi.org/10.1121/1.2832617