Browse > Article
http://dx.doi.org/10.5909/JBE.2012.17.6.1061

Binary Mask Estimation using Training-based SNR Estimation for Improving Speech Intelligibility  

Kim, Gibak (School of Electrical Engineering, Soongsil University)
Publication Information
Journal of Broadcast Engineering / v.17, no.6, 2012 , pp. 1061-1068 More about this Journal
Abstract
This paper deals with a noise reduction algorithm which uses the binary masking approach in the time-frequency domain to improve speech intelligibility. In the binary masking approach, the noise-corrupted speech is decomposed into time-frequency units. Noise-dominant time-frequency units are removed by setting the corresponding binary masks as "0"s and target-dominant units are retained untouched by assigning mask "1"s. We propose a binary mask estimation by comparing the local signal-to-noise ratio (SNR) to a threshold. The local SNR is estimated by a training-based approach. An optimal threshold is proposed, which is obtained from observing the distribution of the training database. The proposed method is evaluated by normal-hearing subjects and the intelligibility scores are computed by counting the number of words correctly recognized.
Keywords
Binary mask; Noise reduction; Speech intelligibility;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. S. Lim and a. V. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proceedings of the IEEE, vol. 67, no. 12, pp. 1586-1604, 1979.
2 S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-27, no. 2, pp. 113-120, 1979.
3 Y. Ephraim and D. Malah, "Speech enhancement using a minimum- mean square error short-time spectral amplitude estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-32, no. 6, pp. 1109-1121, 1984.
4 Y. Ephraim and H. Van Trees, "A signal subspace approach for speech enhancement," IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, 1995.   DOI   ScienceOn
5 J. Huang and Y. Zhao, "An energy-constrained signal subspace method for speech enhancement and recognition in white and colored noises," Speech Communication, vol. 26, no. 3, pp. 165-181, Nov. 1998.   DOI   ScienceOn
6 K. Hermus, P. Wambacq, and H. Hamme, "A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition," EURASIP Journal on Advances in Signal Processing, vol. 2007, no. 1, p. 045821, 2007.   DOI   ScienceOn
7 Y. Hu and P. C. Loizou, "Subjective comparison and evaluation of speech enhancement algorithms," Speech communication, vol. 49, no. 7, pp. 588-601, Jul. 2007.   DOI   ScienceOn
8 Y. Hu and P. Loizou, "Evaluation of objective quality measures for speech enhancement," IEEE Transactions on Speech and Audio Processing, vol. 16, no. 1, pp. 229-238, 2008.   DOI   ScienceOn
9 Y. Hu and P. C. Loizou, "A comparative intelligibility study of single- microphone noise reduction algorithms." The Journal of the Acoustical Society of America, vol. 122, no. 3, p. 1777, Sep. 2007.   DOI   ScienceOn
10 G. Brown and M. Cooke, "Computational auditory scene analysis," Computer speech and language, vol. 8, pp. 297-336, 1994.   DOI   ScienceOn
11 D. Wang and G. Brown, Computational Auditory Scene Analysis : Principles, Algorithms, and Applications, Wiley, Hoboken, NJ, 2006.
12 D. Wang, "On ideal binary mask as the computational goal of auditory scene analysis," In Divenyi P. (ed.), Speech Separation by Humans and Machines, pp. 181-197, Kluwer Academic, Norwell MA, 2005.
13 D. S. Brungart, P. S. Chang, B. D. Simpson, and D. Wang, "Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation," The Journal of the Acoustical Society of America, vol. 120, no. 6, p. 4007, 2006.   DOI   ScienceOn
14 N. Li and P. C. Loizou, "Factors influencing intelligibility of ideal binary- masked speech: implications for noise reduction.," The Journal of the Acoustical Society of America, vol. 123, no. 3, pp. 1673-82, Mar. 2008.   DOI   ScienceOn
15 N. Li and P. C. Loizou, "Effect of spectral resolution on the intelligibility of ideal binary masked speech.," The Journal of the Acoustical Society of America, vol. 123, no. 4, pp. EL59-64, Apr. 2008.   DOI   ScienceOn
16 Y. Hu and P. Loizou, "Techniques for estimating the ideal binary mask," in Proc. 11th Int. Workshop Acoust. Echo Noise Control, 2008.
17 J. Tchorz and B. Kollmeier, "Estimation of the signal-to-noise ratio with amplitude modulation spectrograms," Speech Communication, vol. 38, no. 1-2, pp. 1-17, Sep. 2002.   DOI   ScienceOn
18 J. Tchorz and B. Kollmeier, "SNR estimation based on amplitude modulation analysis with applications to noise suppression," IEEE Transactions on Speech and Audio Processing, vol. 11, no. 3, pp. 184-192, May 2003.   DOI   ScienceOn
19 G. Langner and C. E. Schreiner, "Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms.," Journal of neuro-physiology, vol. 60, no. 6, pp. 1799-822, Dec. 1988.
20 M. Kleinschmidt and V. Hohmann, "Sub-band SNR estimation using auditory feature processing," Speech Communication, vol. 39, no. 1-2, pp. 47-63, Jan. 2003.   DOI   ScienceOn
21 B. Kollmeier and R. Koch, "Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction.," The Journal of the Acoustical Society of America, vol. 95, no. 3, pp. 1593-602, Mar. 1994.   DOI   ScienceOn
22 S. Stevens, J. Volkmann, and E. Newman, "A scale for the measurement of the psychological magnitude pitch," The Journal of the Acoustical Society of America, vol. 8, no. 3, pp. 185-190, 1937.   DOI
23 C. Bishop, Neural Networks for Pattern Recognition, New York: Oxford Univ. Press, 1995.