Browse > Article
http://dx.doi.org/10.7776/ASK.2006.25.2.088

Improvements on Speech Recognition for Fast Speech  

Lee Ki-Seung (건국대학교 정보통신대학 전자공학부)
Abstract
In this Paper. a method for improving the performance of automatic speech recognition (ASR) system for conversational speech is proposed. which mainly focuses on increasing the robustness against the rapidly speaking utterances. The proposed method doesn't require an additional speech recognition task to represent speaking rate quantitatively. Energy distribution for special bands is employed to detect the vowel regions, the number of vowels Per unit second is then computed as speaking rate. To improve the Performance for fast speech. in the pervious methods. a sequence of the feature vectors is expanded by a given scaling factor, which is computed by a ratio between the standard phoneme duration and the measured one. However, in the method proposed herein. utterances are classified by their speaking rates. and the scaling factor is determined individually for each class. In this procedure, a maximum likelihood criterion is employed. By the results from the ASR experiments devised for the 10-digits mobile phone number. it is confirmed that the overall error rate was reduced by $17.8\%$ when the proposed method is employed
Keywords
Automatic Speech Recognition; Maximum likelihood; Speaking Rate;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 M.J. Russell, K. M. Ponting and M.J. TomIinson, 'Measure of local speaking-rate for automatic speech recognition,' lEE Electronics Letters, 35 (10), 787-789, 1999   DOI   ScienceOn
2 M.H. Nguyen and G. W. Cottrell, 'A technique for adapting to speech rate,' The proceedings of the 1993 IEEE-SP workshop, 6-9, 382-391, September 1993
3 T. Pfau and G. Ruske, 'Estimating the speaking rate by vowel detection,' The Proceedings of the ICASSP 98, 945-948. 1998
4 L. R. Rabiner, 'A tutorial on hidden Markov models and selected applications in speech recognition,' Proceedings of the IEEE. 77. Issue 2, 257-286. 1989
5 N. Mirghafori, E. Fosler and N. Morgan 'Fast speakers in large vocabulary continuous speech recognition: analysis & antidotes,' The proceedings of EUROSPEECH95, 491-494, Madrid, Spain, September 1995
6 M. Richardson, M. Hwang, A. Acero and X. Huang. 'Improvements on speech recognition for fast talkers, ' The proceedings of EUROSPEECH1999. 411-414. 1999
7 이기승, '시간축 변환을 이용한 음성 인식기의 성능 향상에 관한 연구,' 한국음향학회지. 23 (6), 462-472, 2004 년 8월
8 N. Mirghafori, E. Fosler and N. Morgan. 'Towards robustness to fast speech in ASR,' The proceedings of ICASSP96, 335-338, Atlanta, USA, 1996
9 R. Fallthauser, T. Pfau and G. Ruske, 'On-line speaking rate estimation using Gaussian mixture models,' The proceedings of ICASSP2000, 1355-1358, 2000
10 L. Deng, D. Yu, and A. Acero. 'A quantitative model for formant dynamics and contextually assimilated reduction in fluent speech,' The Proceedings of the ICSLP, Oct.4-8, 2004, Jeju Island, Korea, No. WeA501 20, 501-504
11 J. Zheng, H. Franco and A. Stolcke, 'Modeling word-level rat e-of-speech variation in large vocabulary conversational speech recognition,' Speech Communication, 41, 273-285, 2003   DOI   ScienceOn