An Automatic Post-processing Method for Speech Recognition using CRFs and TBL

CRFs와 TBL을 이용한 자동화된 음성인식 후처리 방법

  • Received : 2010.01.19
  • Accepted : 2010.07.06
  • Published : 2010.09.15

Abstract

In the applications of a human speech interface, reducing the error rate in recognition is the one of the main research issues. Many previous studies attempted to correct errors using post-processing, which is dependent on a manually constructed corpus and correction patterns. We propose an automatically learnable post-processing method that is independent of the characteristics of both the domain and the speech recognizer. We divide the entire post-processing task into two steps: error detection and error correction. We consider the error detection step as a classification problem for which we apply the conditional random fields (CRFs) classifier. Furthermore, we apply transformation-based learning (TBL) to the error correction step. Our experimental results indicate that the proposed method corrects a speech recognizer's insertion, deletion, and substitution errors by 25.85%, 3.57%, and 7.42%, respectively.

음성 인식기의 오류는 음성기반 응용 시스템들의 성능에 크게 영향을 주기 때문에 오류를 줄이기 위한 효과적인 처리 방법이 필요하다. 기존의 후처리 기법들은 수동 작업을 통한 코퍼스나 규칙으로 후처리를 수행하는 것이 일반적이다. 본 논문에서는 문제나 인식기의 특성에 무관하게 자동으로 학습할 수 있는 후처리 모델을 제안한다. 후처리의 문제를 오류의 인식과 수정으로 구분하고 오류 검출 문제는 순차적인 분류 문제로 간주하여 conditional random fields(CRFs)를 사용하고 오류 수정 규칙은 transformation-based learning(TBL)을 이용하여 자동 생성하여 적용하였다. 제안한 방법을 여행 예약 영역의 음성 인식기에 적용한 결과 삽입, 삭제, 치환 오류를 각각 25.85%, 3.57%, 7.42%을 수정하였으며, 이로 인해 인식기의 어휘 오류율을 2% 감소시킬 수 있었다.

Keywords

References

  1. E. K. Ringger and J. F. Allen, "A Fertility Channel Model for Post-Correction of Continuous Speech Recognition," Proceedings of the Fourth International Conference on Spoken Language Processing (ICSLP96), vol.2, pp.897-900, 1996.
  2. S. Kaki et al., "A Method for Correcting Errors in Speech Recognition Using the Statistical Features of Character Co-occurrence," Proceedings of the 17th international conference on Computational linguistics, vol.1, pp.653-657, 1998.
  3. Y. Kim, M. Jeong, "Improving Performance of Continuous Speech Recognition Using Error Pattern Training and Post Processing Module," Proceedings of the KIISE Korea Computer Congress 2000, vol.27, no.1, pp.441-358, 2000. (in Korean)
  4. M. Jeong, B. Kim, G. G. Lee, "Semantic-Oriented Error Correction for Spoken Query Processing," Proceedings on IEEE Automatic speech recognition and understanding workshop (ASRU2003), pp.156- 161, 2003.
  5. M. Jeong, S. Jung, G. G. Lee, "Speech recognition error correction using maximum entropy language model," Proceedings of Interspeech, pp.2137-2140, 2004.
  6. R. Lopez-Cozar, Z. Callejas, "ASR post-correction for spoken dialogue systems based on semantic, syntactic, lexical and contextual information," Speech Communication, vol.50, Issue.8-9, pp.745-766, 2008. https://doi.org/10.1016/j.specom.2008.03.008
  7. J. Lafferty et al., "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," Proceedings of ICML, pp.282-289, 2001.
  8. E. Brill, "A Simple Rule-based Part of Speech Tagger," Proceedings of the Third Conference on Applied Natural Language Processing, pp.152-155, 1992.
  9. K. Lee, Morph-Phonological Modeling of Pronunciation Variation for Korean Large Vocabulary Continuous Speech Recognition, Ph.D Thesis, Sogang University, 2006.