원거리 음성 인식 성능 확보를 위한 음향 신호처리 측면의 고찰

  • Published : 2016.05.25

Abstract

Keywords

References

  1. K. Kumatani, J. McDonough, and B. Raj, "Microphone array processing for distant speech recognition," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 127-140, Nov. 2012. https://doi.org/10.1109/MSP.2012.2205285
  2. M. Wolfel and J. McDonough, Distant Speech Recognition, Hoboken, NJ: Wiley, 2009.
  3. http://www.amazon.com/oc/echo/
  4. M. L. Seltzer, D. Yu, and Y.-Q. Wang, "An investigation of deep neural networks for noise robust speech recognition," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 7398-7402, May 2013.
  5. O. Viikki and L. Laurila, "Cepstral domain segmental feature vector normalization for noise robust speech recognition," Speech Communication, vol. 25, pp. 133-147, Aug. 1998. https://doi.org/10.1016/S0167-6393(98)00033-8
  6. J. Benesty, J. Chen, and Y. Huang, Microphone array signal processing, Berliln, Germany: Springer-Verlag, 2008.
  7. L. J. Griffiths and C. W. Jim, "An alternative approach to linearly constrained adaptive beamforming," IEEE Trans. Antennas Propagat., vol. AP-30, no. 1, pp. 27-34, Jan. 1982.
  8. S. Gannot, D. Burstein, and E. Weinstein, "Signal enhancement using beamforming and nonstationarity with applications to speech," IEEE Trans. Signal Process., vol. 49, no. 8, pp. 1614-1626, Aug. 2001. https://doi.org/10.1109/78.934132
  9. F. Nesta and M. Matassoni, "Blind source extraction for robust speech recognition in multisource noisy environments," Computer Speech & Language, vol. 27, no. 3, pp. 703-725, 2013. https://doi.org/10.1016/j.csl.2012.08.001
  10. M. Souden, J. Benesty, and S. Affes, "On optimal frequencydomain multichannel linear filtering for noise reduction," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 2, pp. 260-276, Feb. 2010. https://doi.org/10.1109/TASL.2009.2025790
  11. Y. G. Jin, J. W. Shin, and N. S. Kim, "Spectro-temporal filtering for multichannel speech enhancement in short-time Fourier transform domain," IEEE Signal Process. Lett., vol. 21, no. 3, pp. 352-355, Mar. 2014. https://doi.org/10.1109/LSP.2014.2302897
  12. T. Yoshioka, A. Sehr, M. Delcroix, K. Kinoshita, R. Maas, T. Nakatani, and W. Kellermann, "Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 114-126, Nov. 2012. https://doi.org/10.1109/MSP.2012.2205029
  13. E. Habets, "Single- and multi-microphone speech dereverberation using spectral enhancement," Ph.D. dissertation, Eindhoven Univ. Technology, Eindhoven, The Netherlands, 2006.
  14. E. Habets and J. Benesty, "A two stage beamforming approach for noise reduction and dereverberation," IEEE Trans. Audio, Speech and Lang. Process., vol. 21, no. 5, pp. 945-958, May 2013. https://doi.org/10.1109/TASL.2013.2239292
  15. T. Yoshioka, T. Nakatani, M. Miyoshi, and H. G. Okuno, "Blind separation and dereverberation of speech mixtures by joint optimization," IEEE Trans. Audio, Speech, Language Process., vol. 19, no. 1, pp. 69-84, Jan. 2011. https://doi.org/10.1109/TASL.2010.2045183