DOI QR코드

DOI QR Code

An Adaptive Utterance Verification Framework Using Minimum Verification Error Training

  • 투고 : 2010.08.16
  • 심사 : 2010.12.22
  • 발행 : 2011.06.30

초록

This paper introduces an adaptive and integrated utterance verification (UV) framework using minimum verification error (MVE) training as a new set of solutions suitable for real applications. UV is traditionally considered an add-on procedure to automatic speech recognition (ASR) and thus treated separately from the ASR system model design. This traditional two-stage approach often fails to cope with a wide range of variations, such as a new speaker or a new environment which is not matched with the original speaker population or the original acoustic environment that the ASR system is trained on. In this paper, we propose an integrated solution to enhance the overall UV system performance in such real applications. The integration is accomplished by adapting and merging the target model for UV with the acoustic model for ASR based on the common MVE principle at each iteration in the recognition stage. The proposed iterative procedure for UV model adaptation also involves revision of the data segmentation and the decoded hypotheses. Under this new framework, remarkable enhancement in not only recognition performance, but also verification performance has been obtained.

키워드

참고문헌

  1. M. Rahim, C.-H. Lee, and B.-H. Juang, "Discriminative Utterance Verification for Connected Digits Recognition," IEEE Trans. Speech Audio Process., vol. 5, May 1997, pp. 266-277. https://doi.org/10.1109/89.568733
  2. E. Lleida and R.C. Rose, "Utterance Verification in Continuous Speech Recognition: Decoding and Training Procedures," IEEE Trans. Speech Audio Process., vol. 8, March 2000, pp. 126-139. https://doi.org/10.1109/89.824697
  3. R.A. Sukkar, A.R. Setlur, and C.-H. Lee, "Vocabulary Independent Discriminative Utterance Verification for Nonkeyword Rejection in Subword Based Speech Recognition," IEEE Trans. Speech Audio Process., vol. 4, pp. 420-429, Nov. 1996. https://doi.org/10.1109/89.544527
  4. E.L. Lehmann, Testing Statistical Hypotheses, John Wiley & Sons, 1959.
  5. S.M. Kay, Fundamentals of Statistical Signal Processing: Detection Theory, NJ: Prentice-Hall, Englewood Cliffs, 1998.
  6. G. Casella and R.L. Berger, Statistical Inference, Duxbury Press, New York, 2001.
  7. C.J. Leggetter and P.C. Woodland, "Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models," Computer Speech and Language, vol. 9, 1995, pp. 171-185. https://doi.org/10.1006/csla.1995.0010
  8. J. Wu and Q. Huo, "A Study of Minimum Classification Error (MCE) Linear Regression for Supervised Adaptation of MCETrained Continuous-Density Hidden Markov Models," IEEE Trans. Speech Audio Process., vol. 15, 2007, pp. 478-488. https://doi.org/10.1109/TASL.2006.881692
  9. M. Rahim and C.-H. Lee, "String-Based Minimum Verification Error (sb-mve) Training for Speech Recognition," Computer Speech and Language, vol. 11, 1997, pp. 147-160. https://doi.org/10.1006/csla.1997.0026
  10. A.E. Rosenberg, O. Siohan, and S. Parthasarathy, "Speaker Verification Using Minimum Verification Error Training," ICASSP, 1998, pp. 105-108.
  11. Q. Fu and B.-H. Juang, "Segment-Based Phonetic Class Detection Using Minimum Verification Error (MVE) Training," in Interspeech, Lisbon, Portugal, Sept. 2005.
  12. M.-W. Koo, C.-H. Lee, and B.-H. Juang, "Speech Recognition and Utterance Verification Based on a Generalized Confidence Score," IEEE Trans. Speech Audio Process., vol. 9, Nov. 2001, pp. 821-832. https://doi.org/10.1109/89.966085
  13. L.R. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Englewood Cliffs, NJ: Prentice-Hall, 1993.
  14. B.-H. Juang, W. Chou, and C.-H. Lee, "Minimum Classification Error Rate Methods for Speech Recognition," IEEE Trans. Speech Audio Process., vol. 5, May 1997, pp. 257-265. https://doi.org/10.1109/89.568732
  15. D. Povey, "Discriminative Training for Large Vocabulary Speech Recognition," PhD thesis, Cambridge University, 2004.
  16. X. He, L. Deng, and W. Chou, "Discriminative Learning in Sequential Pattern Recognition: A Unifying Review for Optimization-Oriented Speech Recognition," IEEE Signal Process. Mag., vol. 25, Sept. 2008, pp. 14-36.
  17. B.-H. Juang and S. Katagiri, "Discriminative Learning for Minimum Error Classification," IEEE Trans. Signal Process., vol. 40, Dec. 1992, pp. 3043-3054. https://doi.org/10.1109/78.175747
  18. W. Chou, C.-H. Lee, and B.-H. Juang, "Segmental GPD Training of HMM Based Speech Recognizer," ICASSP, Apr., 1992, pp. 473-476.
  19. Q. Fu and B.-H. Juang, "A Study on Rescoring Using HMMBased Detectors for Continuous Speech Recognition," ASRU, Kyoto, Japan, Dec. 2007, pp. 570-575.
  20. S. Shin et al., "Discriminative Linear-Transform Based Adaptation Using Minimum Verification Error," ICASSP, Texas, USA, Mar. 2010, pp. 4318-4321.
  21. W. Chou, "Minimum Classification Error Approach in Pattern Recognition," Pattern Recognition in Speech and Language Processing, W. Chou and B.-H. Juang, Eds., Boca Raton: CRC Press, 2003, pp. 1-49.
  22. F. Wessel et al., "Confidence Measures for Large Vocabulary Continuous Speech Recognition," IEEE Trans. Speech Audio Proc., vol. 9, no. 3, Mar. 2001, pp. 288-298. https://doi.org/10.1109/89.906002
  23. T. Hazen and I. Bazzi, "A Comparison and Combination of Methods for OOV Word Detection and Word Confidence Scoring," IEEE Int. Conf. Acoustics, Speech, Signal Process., Salt Lake City, Utah, May 2001.
  24. F.K. Soong, W.K. Lo, and S. Nakamura, "Generalized Word Posterior Probability (GWPP) for Measuring Reliability of Recognized Words," Proc. SWIM, 2004.
  25. M.-H Siu, B. Mak, and W.-H. Au, "Minimization of Utterance Verification Error Rate as a Constrained Optimization Problem," IEEE Signal Process., Letters, vol. 13, Dec. 2006, pp. 760-763. https://doi.org/10.1109/LSP.2006.879818
  26. J.A. Snyman, Practical Mathematical Optimization, New York: Springer, 2005.
  27. A. Martin et al., "The DET Curve in Assessment of Detection Task Performance," Proc. European Conf. Speech Commun. Technol., 1997, pp. 1895-1898.