Optimizing Multiple Pronunciation Dictionary Based on a Confusability Measure for Non-native Speech Recognition

타언어권 화자 음성 인식을 위한 혼잡도에 기반한 다중발음사전의 최적화 기법

  • 김민아 (LG전자 MC 사업부) ;
  • 오유리 (광주과학기술원 정보통신공학과 휴먼컴퓨팅 연구실) ;
  • 김홍국 (광주과학기술원 정보통신공학과 휴먼컴퓨팅 연구실) ;
  • 이연우 (목포대학교 공과대학 정보공학부 정보통신공학) ;
  • 조성의 (목포대학교 공과대학 컴퓨터교육과) ;
  • 이성로 (목포대학교 공과대학 정보공학부 정보전자공학)
  • Published : 2008.03.30

Abstract

In this paper, we propose a method for optimizing a multiple pronunciation dictionary used for modeling pronunciation variations of non-native speech. The proposed method removes some confusable pronunciation variants in the dictionary, resulting in a reduced dictionary size and less decoding time for automatic speech recognition (ASR). To this end, a confusability measure is first defined based on the Levenshtein distance between two different pronunciation variants. Then, the number of phonemes for each pronunciation variant is incorporated into the confusability measure to compensate for ASR errors due to words of a shorter length. We investigate the effect of the proposed method on ASR performance, where Korean is selected as the target language and Korean utterances spoken by Chinese native speakers are considered as non-native speech. It is shown from the experiments that an ASR system using the multiple pronunciation dictionary optimized by the proposed method can provide a relative average word error rate reduction of 6.25%, with 11.67% less ASR decoding time, as compared with that using a multiple pronunciation dictionary without the optimization.

Keywords