DOI QR코드

DOI QR Code

Landmark-Guided Segmental Speech Decoding for Continuous Mandarin Speech Recognition

  • Chao, Hao (College of Computer Science and Technology, Henan Polytechnic University) ;
  • Song, Cheng (College of Computer Science and Technology, Henan Polytechnic University)
  • 투고 : 2014.11.04
  • 심사 : 2015.06.17
  • 발행 : 2016.09.30

초록

In this paper, we propose a framework that attempts to incorporate landmarks into a segment-based Mandarin speech recognition system. In this method, landmarks provide boundary information and phonetic class information, and the information is used to direct the decoding process. To prove the validity of this method, two kinds of landmarks that can be reliably detected are used to direct the decoding process of a segment model (SM) based Mandarin LVCSR (large vocabulary continuous speech recognition) system. The results of our experiment show that about 30% decoding time can be saved without an obvious decrease in recognition accuracy. Thus, the potential of our method is demonstrated.

키워드

참고문헌

  1. M. Ostendorf, V. Digalakis, and O. Kimball, "From HMM's to segment models: a unified view of stochastic modeling for speech recognition," IEEE Transactions on Speech and Audio Processing, vol. 4, no. 5, pp. 360-378, 1996. https://doi.org/10.1109/89.536930
  2. Y. Tang, W. J. Liu, H. Zhang, B. Xu, and G. H. Ding, "One-pass coarse-to-fine segmental speech decoding algorithm," in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France, 2006, pp. 441-444.
  3. Y. Tang, W. Liu, Y. Zhang, and B. Xu, "A fast framework for the constrained mean trajectory model by avoidance of redundant computation on segment," International Journal of Computational Linguistics and Chinese Language Processing, vol. 11, no. 1, pp. 73-86, 2006.
  4. S. A. Liu, "Landmark detection for distinctive feature-based speech recognition," Journal of the Acoustical Society of America, vol. 100, no. 5, pp. 3417-3430, 1996. https://doi.org/10.1121/1.416983
  5. Z. Yang, W. Liu, and H. Chao, "An improved steady segment based decoding algorithm by using response probability for LVCSR," in Proceedings of International Symposium on Chinese Spoken Language Processing, Hong Kong, 2012, pp. 306-310.
  6. Z. Yang and W. Liu, "A novel path extension framework using steady segment detection for Mandarin speech recognition," in Proceedings of 11th Annual Conference of the International Speech Communication Association, Makuhari, Japan, 2010, pp. 226-229.
  7. M. Hasegawa-Johnson, J. Baker, S. Borys, K. Chen, E. Coogan, S. Greenberg, et al., "Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop," in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia, PA, 2005, pp. 213-216.
  8. C. Y. Park, "Consonant landmark detection for speech recognition," Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, MA, 2008.
  9. J. Q. Han, L. Zhang, and T. R. Zheng, Speech Signal Processing, 1st ed. Beijing: Tsinghua University, 2005, pp. 20-23.
  10. W. Howitt, "Vowel landmark detection," Journal of the Acoustical Society of America, vol. 112, no. 5, pp. 2279-2279, 2002.
  11. S. Young, G. Evermann, and M. Gales, The HTK Book (Version 3.0). Cambridge: Microsoft, 2000, pp. 49-192.