Browse > Article

Analysis of Korean Spontaneous Speech Characteristics for Spoken Dialogue Recognition  

박영희 (서강대학교 컴퓨터학과 음성언어처리연구실)
정민화 (서강대학교 컴퓨터학과 음성언어처리연구실)
Abstract
Spontaneous speech is ungrammatical as well as serious phonological variations, which make recognition extremely difficult, compared with read speech. In this paper, for conversational speech recognition, we analyze the transcriptions of the real conversational speech, and then classify the characteristics of conversational speech in the speech recognition aspect. Reflecting these features, we obtain the baseline system for conversational speech recognition. The classification consists of long duration of silence, disfluencies and phonological variations; each of them is classified with similar features. To deal with these characteristics, first, we update silence model and append a filled pause model, a garbage model; second, we append multiple phonetic transcriptions to lexicon for most frequent phonological variations. In our experiments, our baseline morpheme error rate (WER) is 31.65%; we obtain MER reductions such as 2.08% for silence and garbage model, 0.73% for filled pause model, and 0.73% for phonological variations. Finally, we obtain 27.92% MER for conversational speech recognition, which will be used as a baseline for further study.
Keywords
Conversational speech recognition; Spontaneous speech recognition; Disfluencies; Noise; pronunciation variations; Filled pauses; Garbage model;
Citations & Related Records
연도 인용수 순위
  • Reference
1 개념 및 구문정보를 이용한 한국어 대화체 분석 시스템 /
[ 왕지현;서영훈 ] / 제9회 한글 및 한국어 정보처리 학술발표 논문집
2 한국어 대화체 인식 시스템의 구현 /
[ 이항섭;박준;권오욱 ] / 제13회 음성통신 및 신호처리 워크샵
3 Preliminaries to a Theory of Speech Disfluencies /
[ E. Shriberg ] / Ph. D. thesis
4 Disfluencies in switchboard /
[ E. Shriberg ] / Proc. of Inter-national Conference on Spoken Language Processing
5 Statistical language modeling for speech disfluencies /
[ A. Stolcke;E. Shriberg ] / Proc. of International Conference on Acoustics, Speech, and Signal
6 Effect of speaking style on LVCSR performance /
[ M. Weintraub;K. Taussing;K.H.;A. Snodgrass ] / Proc. of Inter-national Conference on Spoken Language Processing
7 Error analysis and disfluencies modeling in the Switchboard domain /
[ R. Rosenfeld;R. Agarwal;R. lyer;L Shriberg;D. Vergyri ] / JHU Summer Workshop
8 Modeling disfluencies in conversational speech /
[ M.H. Siu;M. Ostendorf ] / Proc. of International Conference on Spoken Language Processing
9 SWITCHBOARD;Telephone speech corpus for research and development /
[ J.J. Godfrey;E.C. Holliman;J. McDaniel ] / Proc. of International Conference on Acoustics, Speech, and Signal
10 The SRI march 2000 HUB-5 conversational speech transcription system /
[ A. Stolcke;H. Bratt;J. Butzberger;H. Franco;V.R. Rao Graoble;M. Plauche;C. Richey;E. Shriberg;K. Sonmez;F. Weng;J. Zheng ] / Proc. of NIST Speech Transcription Workshop
11 Pronunciation modeling using a Hand-labelled corpus for conversational speech recognition /
[ B. Byrne;M. Finke;S. Khudanpur;J. McDounugh;H. Nock;M. Riley;M. Saraclar;C. Wooters;G. Zavaliagkos ] / Proc. of International Conference on Acoustics, Speeech, and Signal
12 Word Predictability after hesitations;A corpus-based study /
[ E. Shriberg;A. Stolcke ] / Proc. of International Conference on Spoken Language Processing
13 Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition /
[ M. Finke;A. Waibel ] / Proc. of EUROSPEECH
14 /
[] / HTK Hidden Markov Model Tookit, Version 2.2
15 한국어 낭독체 인식의 발성 잡음처리를 위한 Human Garbage 모델링 /
[ 이경님;정민화 ] / 한국음향학회 하계학술대회논문집