Fig. 1. Recognition performance in “Known” noisy environments at different SNRs as average over all environments: Subway, Babble, Car and Exhibition (WER, %).
Fig. 2. Recognition performance in “Unknown” noisy environments at different SNRs as average over all environments: Factory, Babble2, Car2 and Music (WER, %).
Table 1. Recognition performance in “Known” noisy environments as average over all SNRs: 0 dB, 5 dB, 10 dB, 15 dB, and 20 dB (WER, %).
Table 2, Recognition performance in “Unknown” noisy environments as average over all SNRs: 0 dB, 5 dB, 10 dB, 15 dB, and 20 dB (WER, %).
참고문헌
- S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," Proc. IEEE Trans. on Acoustics, Speech and Signal, 27, 113-120 (1979).
- Y. Ephraim and D. Malah, "Speech enhancement using minimum mean square error short time spectral amplitude estimator," Proc. IEEE Trans. on Acoustics, Speech and Signal, 32, 109-1121 (1984).
- J. H. L. Hansen and M. Clements, "Constrained iterative speech enhancement with application to speech recognition," Proc. IEEE Trans. on Signal, 39, 795-805 (1991).
- P. J. Moreno, B. Raj, and R. M. Stern, "Data-driven environmental compensation for speech recognition: a unified approach," Speech Communication, 24, 267-285 (1998). https://doi.org/10.1016/S0167-6393(98)00025-9
- W. Kim and J. H. L. Hansen, "Feature compensation in the cepstral domain employing model combination," Speech Communication, 51, 83-96 (2009). https://doi.org/10.1016/j.specom.2008.06.004
- J. L. Gauvain and C. H. Lee, "Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains," Proc. IEEE Trans. on Speech and Audio, 2, 291-298 (1994).
- C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density HMMs," Computer Speech and Language, 9, 171-185 (1995). https://doi.org/10.1006/csla.1995.0010
- M. J. F. Gales and S. J. Young, "Robust continuous speech recognition using parallel model combination," Proc. IEEE Trans. on Speech and Audio, 4, 352-359 (1996).
- J. Du, L.-R. Dai, and Q. Huo, "Synthesized stereo mapping via deep neural networks for noisy speech recognition," ICASSP 2014, 1764-1768 (2014).
- K. Han, Y. He, D. Bagchi, E. Fosler-Lussier, and D. Wang, "Deep neural network based spectral feature mapping for robust speech recognition," Interspeech 2015, 2484-2488 (2015).
- H. G. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions," ISCA ITRW ASR2000, Sep. 2000.
- ETSI ES 201 108, ETSI standard document, v1.1.2(2000- 04), Feb. 2000.
- R. Martin, "Spectral Subtraction Based on Minimum Statistics," EUSIPCO-94, 1182-1185 (1994).