Browse > Article
http://dx.doi.org/10.5909/JBE.2013.18.2.311

A Post-processing for Binary Mask Estimation Toward Improving Speech Intelligibility in Noise  

Kim, Gibak (School of Electrical Engineering, Soongsil University)
Publication Information
Journal of Broadcast Engineering / v.18, no.2, 2013 , pp. 311-318 More about this Journal
Abstract
This paper deals with a noise reduction algorithm which uses the binary masking in the time-frequency domain. To improve speech intelligibility in noise, noise-masked speech is decomposed into time-frequency units and mask "0" is assigned to masker-dominant region removing time-frequency units where noise is dominant compared to speech. In the previous research, Gaussian mixture models were used to classify the speech-dominant region and noise-dominant region which correspond to mask "1" and mask "0", respectively. In each frequency band, data were collected and trained to build the Gaussian mixture models and detection procedure is performed to the test data where each time-frequency unit belongs to speech-dominant region or noise-dominant region. In this paper, we consider the correlation of masks in the frequency domain and propose a post-processing method which exploits the Viterbi algorithm.
Keywords
Binary mask; noise reduction; Viterbi algorithm;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. S. Lim and A. V. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proceedings of the IEEE, vol. 67, no. 12, pp. 1586-1604, 1979.
2 S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans.on Acoustics, Speech, and Signal Processing, vol. ASSP-27, no. 2, pp. 113-120, 1979.
3 Y. Ephraim and D. Malah, "Speech enhancement using a minimum- mean square error short-time spectral amplitude estimator," IEEE Trans.on Acoustics, Speech, and Signal Processing, vol. ASSP-32, no. 6, pp. 1109-1121, 1984.
4 Y. Ephraim and H. Van Trees, "A signal subspace approach for speech enhancement," IEEE Trans.on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, 1995.   DOI   ScienceOn
5 J. Huang and Y. Zhao, "An energy-constrained signal subspace method for speech enhancement and recognition in white and colored noises," Speech Communication, vol. 26, no. 3, pp. 165-181, Nov. 1998.   DOI   ScienceOn
6 K. Hermus, P. Wambacq, and H. Hamme, "A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition," EURASIP Journal on Advances in Signal Processing, vol. 2007, no. 1, p. 045821, 2007.   DOI   ScienceOn
7 M. Brandstein and D. Ward (Eds.), Microphone Arrays, Springer-Verlag, 2001.
8 Y. Hu and P. C. Loizou, "Subjective comparison and evaluation of speech enhancement algorithms," Speech communication, vol. 49, no. 7, pp. 588-601, Jul. 2007.   DOI   ScienceOn
9 Y. Hu and P. Loizou, "Evaluation of objective quality measures for speech enhancement," IEEE Trans. on Speech and Audio Processing, vol. 16, no. 1, pp. 229-238, 2008.   DOI   ScienceOn
10 Y. Hu and P. C. Loizou, "A comparative intelligibility study of single- microphone noise reduction algorithms." J. Acoust. Soc. Am., vol. 122, no. 3, p. 1777, Sep. 2007.   DOI   ScienceOn
11 G. Kim, Y. Lu, Y. Hu and P. C. Loizou, "An algorithm that improves speech intelligibility in noise for normal-hearing listeners," J. Acoust. Soc. Am., vol. 126, no. 3, pp. 1486-1494, September 2009.   DOI   ScienceOn
12 G. Kim and P. C. Loizou, "Improving speech intelligibility in noise using environment-optimized algorithms," IEEE trans. Audio, Speech and Language Processing, vol. 18, no. 8, pp. 2080-2090, November 2010.   DOI   ScienceOn
13 G. Brown and M. Cooke, "Computational auditory scene analysis," Computer speech and language, vol. 8, pp. 297-336, 1994.   DOI   ScienceOn
14 D. Wang and G. Brown, Computational Auditory Scene Analysis : Principles, Algorithms, and Applications, Wiley, Hoboken, NJ, 2006.
15 A. Viterbi, "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Trans. Information Theoory, vol. 13, Issue 2, pp. 260-269, April 1967.   DOI   ScienceOn
16 G. Forney, Jr., "The Viterbi algorithm," Proceedings of the IEEE, vol. 61, Issue 3, pp. 268-278, March 1973.
17 J. Tchorz and B. Kollmeier, "Estimation of the signal-to-noise ratio with amplitude modulation spectrograms," Speech Communication, vol. 38, no. 1-2, pp. 1-17, Sep. 2002.   DOI   ScienceOn
18 J. Tchorz and B. Kollmeier, "SNR estimation based on amplitude modulation analysis with applications to noise suppression," IEEE Trans. on Speech and Audio Processing, vol. 11, no. 3, pp. 184-192, May 2003.   DOI   ScienceOn
19 M. Kleinschmidt and V. Hohmann, "Sub-band SNR estimation using auditory feature processing," Speech Communication, vol. 39, no. 1- 2, pp. 47-63, Jan. 2003.   DOI   ScienceOn
20 S. Stevens, J. Volkmann, and E. Newman, "A scale for the measurement of the psychological magnitude pitch," J. Acoust. Soc. Am., vol. 8, no. 3, pp. 185-190, 1937.   DOI
21 B. Zhang and S. N. Srihari, "Binary vector dissimilarity measure for handwriting identification," Proceeding of SPIE, pp. 155-166, 2003.
22 IEEE, "IEEE recommended practice for speech quality measurements"," IEEE Trans. Audio Electroacoust., vol. 17, pp. 225-246, 1969.   DOI
23 A. Varga and H. J. M. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A databased and an experiment to study the effect of additive noise on speech recognition systems," Speech Communication, vol. 12, pp. 247-251, 1993.   DOI   ScienceOn
24 N. Li and P. C. Loizou, "Factors influencing intelligibility of ideaal binary- masked speech: Implications for noise reduction," J. Acoust. Soc. Am., vol. 123, no. 3, pp. 1673-1682, March 2008.   DOI   ScienceOn