Fig. 1. Configuration of the deep neural network employed for estimation of noise corruption function in the proposed method.
Table 1. Recognition performance in “known” noisy environments as average over all SNRs: 0 dB, 5 dB, 10 dB, 15 dB, and 20 dB (WER, %).
Table 2. Recognition performance in “unknown”noisy environments as average over all SNRs: 0 dB, 5 dB, 10 dB, 15 dB, and 20 dB (WER, %).
References
- S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," Proc. IEEE Trans. on Acoustics, Speech and Signal, 27, 113-120 (1979). https://doi.org/10.1109/TASSP.1979.1163209
- Y. Ephraim and D. Malah, "Speech enhancement using minimum mean square error short time spectral amplitude estimator," Proc. IEEE Trans. on Acoustics, Speech and Signal, 32, 1109-1121 (1984). https://doi.org/10.1109/TASSP.1984.1164453
- P. J. Moreno, B. Raj, and R. M. Stern, "Data-driven environmental compensation for speech recognition: a unified approach," Speech Communication, 24, 267-285 (1998). https://doi.org/10.1016/S0167-6393(98)00025-9
- W. Kim and J. H. L. Hansen, "Feature compensation in the cepstral domain employing model combination," Speech Communication, 51, 83-96 (2009). https://doi.org/10.1016/j.specom.2008.06.004
- C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density HMMs," Computer Speech and Language, 9, 171-185 (1995). https://doi.org/10.1006/csla.1995.0010
- M. J. F. Gales and S. J. Young, "Robust continuous speech recognition using parallel model combination," Proc. IEEE Trans. on Speech and Audio, 4, 352-359 (1996). https://doi.org/10.1109/89.536929
- J. Du, L.-R. Dai, and Q. Huo, "Synthesized stereo mapping via deep neural networks for noisy speech recognition," ICASSP 2014, 1764-1768 (2014).
- K. Han, Y. He, D. Bagchi, E. F. -Luissier, and D. L. Wang, "Deep neural network based spectral feature mapping for robust speech recognition," Interspeech 2015, 2484-2488 (2015).
- H. G. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions," ISCA ITRW ASR2000 (2000).