Fig. 1. Block diagram of the VMC-PCGMM-based feature compensation scheme.[5]
Fig. 2. Block diagram of the proposed speech enhance-ment scheme employing the VMC-PCGMM-basedfeature compensation method.
Fig. 3. Recognition performance for music noise in 5 dB SNR as change of the number of selected noise models (WER, %).
Table 1. Speech recognition performance with the matched ASR system condition (WER, %).
Table 2. Speech recognition performance with the mis-matched ASR system condition (WER, %).
참고문헌
- S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," Proc. IEEE Trans. on Acoustics, Speech and Signal, 27, 113-120 (1979). https://doi.org/10.1109/TASSP.1979.1163209
- P. J. Moreno, B. Raj, and R. M. Stern, "Data-driven environmental compensation for speech recognition: a unified approach," Speech Communication, 24, 267-285 (1998). https://doi.org/10.1016/S0167-6393(98)00025-9
- W. Kim and J. H. L. Hansen, "Variational noise model composition through model perturbation for robust speech recognition with time-varying background noise," Speech Communication, 53, 451-464 (2011). https://doi.org/10.1016/j.specom.2010.12.001
- J. L. Gauvain and C. H. Lee, "Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains," Proc. IEEE Trans. on Speech and Audio, 2, 291-298 (1994). https://doi.org/10.1109/89.279278
- C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density HMMs," Computer Speech and Language, 9, 171-185 (1995). https://doi.org/10.1006/csla.1995.0010
- M. J. F. Gales and S. J. Young, "Robust continuous speech recognition using parallel model combination," Proc. IEEE Trans. on Speech and Audio, 4, 352-359 (1996). https://doi.org/10.1109/89.536929
- J. Du, L.-R. Dai, and Q. Huo, "Synthesized stereo mapping via deep neural networks for noisy speech recognition," ICASSP 2014, 1764-1768 (2014).
- K. Han, Y. He, D. Bangchi, E. F. -Lussifer, and D. L. Wang, "Deep neural network based spectral feature mapping for robust speech recognition," Interspeech 2015, 2484-2488 (2015).
- H. G. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions," ISCA ITRW ASR2000 (2000).
- W. Kim, "Speech enhancement based on feature compensation for independently applying to different types of speech recognition systems" (in Korean), J. Korea Institute of Information and Communication Engineering, 18, 2367-2374 (2014). https://doi.org/10.6109/jkiice.2014.18.10.2367
- ETSI ES 201 108, ETSI Standard Document, v1.1.2 (2000-04), 2000.
- http://htk.eng.cam.ac.uk