Browse > Article

Performance Analysis of a Class of Single Channel Speech Enhancement Algorithms for Automatic Speech Recognition  

Song, Myung-Suk (Dept. of Electrical and Electronic Engineering, Yonsei University)
Lee, Chang-Heon (Dept. of Electrical and Electronic Engineering, Yonsei University)
Lee, Seok-Pil (Broadcasting-Communication Convergence Research Center, Korea Electronics Technology Institute)
Kang, Hong-Goo (Dept. of Electrical and Electronic Engineering, Yonsei University)
Abstract
This paper analyzes the performance of various single channel speech enhancement algorithms when they are applied to automatic speech recognition (ASR) systems as a preprocessor. The functional modules of speech enhancement systems are first divided into four major modules such as a gain estimator, a noise power spectrum estimator, a priori signal to noise ratio (SNR) estimator, and a speech absence probability (SAP) estimator. We investigate the relationship between speech recognition accuracy and the roles of each module. Simulation results show that the Wiener filter outperforms other gain functions such as minimum mean square error-short time spectral amplitude (MMSE-STSA) and minimum mean square error-log spectral amplitude (MMSE-LSA) estimators when a perfect noise estimator is applied. When the performance of the noise estimator degrades, however, MMSE methods including the decision directed module to estimate a priori SNR and the SAP estimation module helps to improve the performance of the enhancement algorithm for speech recognition systems.
Keywords
Single channel speech enhancement; Speech recognition; Performance analysis;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. S. Garofolo, Getting started with the DARPA TIMIT CD-ROM: and acoustic phonelic continuous speech database, National Inslitule of Standards and technology (NIST), Gaithersburg, Maryland, (prototype as of December 1988).
2 J. Chen, J. Benesy, Y. Huang, and S. Doclo, "New insights into the noise reduction wiener -lter", IEEE transaction on Audio, Speech, and Language processing, vol. 14, no. 4, Jul. 2006
3 D. Middelton, Introduction to Stalistical Communication Theory, New York: McGraw-Hill, 1960, ch,7, appendix 1.
4 A. Varga, H.J.M. Sleeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems", Speech Commun. vol. 12, no. 3, pp.247-251, Jul. 1993.   DOI   ScienceOn
5 Y. Hu and P.C.Loizou, "Subjective comparison of speech enhancement algorithms", Proc. Int. Conf. Acoustics, Speech, Signal Processing 2006, pp. 153-156, 2006.
6 S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, V. Valtchev, P. Woodland, "The HTK Book," copyrighl 1995-1999 Microsoft Corporation, copyright 2001-2002 Cambridge University Engineering Department.
7 I. Cohen, "Noise spectrum estimation in adverse environments : Improved minima controlled recursive averaging", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 11, no. 5, Sep. 2003
8 I. Cohen, "Relaxed statistical model for speech enhancement and a priori SNR estimation", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 13, no.5, Sep. 2005.
9 I. Cohen, "Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator," IEEE Signal processing letters, vol. 9, no 4, pp.113-116, Apr. 2002.   DOI   ScienceOn
10 M. S. Choi and H. G. Kang, "An improved estimation of a priori speech absence probability for speech enhancement : In perspective of speech perception", Proc. Int. Conf. Acoustics, Speech, Signal Processing 2005, pp. 1117-1120, 2005.
11 H. L. Van Trees, Detection, Estimation and Modulation Theory. part I., New York: Wiley, 1968, pp. 54-56, 198-206, 205-207.
12 C. H. You, S. N. Koh, and S. Rahardja, "$\beta$-order MMSE spectral amplitude estimation for speech enhancement", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 13, no. 4, Jul. 2005.
13 R. J. Mcaulay, "Speech enhancement using a soft-decision noise suppression filter", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 2, Apr. 1980.
14 O. Cappe, "Elimination of the musical noise phenomenon with the ephraim and malah noise suppressor", IEEE transaction on Speech and Audio processing, vol. 2, no. 2, Apr. 1994.
15 Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-33, pp. 443-445, Apr. 1985.
16 D. Malah, R. V. Cox, and A. J. Accardi, "Tracking speechpresence uncertainly to improve speech enhancement in non-stationary noise environments," Proc. Int Conf. Acoustics, Speech, Signal Processing 1999, pp.789-792, 1999.
17 R. Martin, "Spectral subtraction based on minimum statistics", Proceedings of the Seventh European Signal Processing Conference, EUSIPCO 94, Edinburgh, Scotland, 13-16, pp. 1182-1185, Sep. 1994.
18 M. S. Song, C. H. Lee, and H. G. Kang, "Performance analysis of various single channel speech enhancement algorithms for automatic speech recognition", Interspeech 2006 ICSLP, pp.1451-1454, Sep. 2006
19 R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 9, pp.504-512, Jul. 2001.
20 Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator", IEEE Transaclions on Acoustics, Speech, and Signal Processing, vol. ASSP-32, pp.1109-1121, Dec. 1984.
21 K. F. Lee and H. W. Hon. "Speaker-independent phone recognition using hidden markov models", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 11, Nov. 1989.
22 H. K. Kim, R. C. Rose, and H. G. Kang, "Acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments", EUROSPEECH-proceeding 2001. vol. 1, pp. 421-424, Sep. 2001.
23 N. W. D. Evans. J. S. D. Mason, W. M. Liu and B. Fauve, "An assessment on the fundamental limitations of spectral subtraction", Proc. Int. Cont. Acoustics, Speech, Signal Processing 2006, pp. 145-148, 2006.
24 Steven F. Boll, "Suppression of acoustic noise in speech using spectral subtraction", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 2, Apr. 1979.
25 I. Cohen and B. Berdugo, "Speech enhancement for nonstafionary noise environments," Signal Process., vol. 81, no. 11, pp. 2403-2418, Oct. 2001.   DOI   ScienceOn