Browse > Article

The Effect of the Number of Training Data on Speech Recognition  

Lee, Chang-Young (Div. of Information System Engineering, Dongseo University)
Abstract
In practical applications of speech recognition, one of the fundamental questions might be on the number of training data that should be provided for a specific task. Though plenty of training data would undoubtedly enhance the system performance, we are then faced with the problem of heavy cost. Therefore, it is of crucial importance to determine the least number of training data that will afford a certain level of accuracy. For this purpose, we investigate the effect of the number of training data on the speaker-independent speech recognition of isolated words by using FVQ/HMM. The result showed that the error rate is roughly inversely proportional to the number of training data and grows linearly with the vocabulary size.
Keywords
Speech recognition; Number of training data; FVQ; HMM;
Citations & Related Records
연도 인용수 순위
  • Reference
1 V. Siivola and A. Honkela, 'A State-Space Method for Language Modelling,' 2003 IEEE Workshop on Automatic Speech recognition and Understanding, PP. 548-553, 2003   DOI
2 N. Jakovljevic and D. Pekar, 'Description of Training Procedure for AlfaNum Continuous Speech Recognition,' EUROCON 2005, PP. 1646-1649, 2005   DOI
3 M. Schaff ner, S. E. Kruger, E. Andelic, M. Katz, and A. Wendemuth, 'Limited Training Data Robust Speech Recognition Using Kernel-Based Acoustic Models.' ICASSP 2006, vol. 1, PP. 1137-1140, 2006   DOI
4 M. Dehghan, K. Faez, M. Ahmadi, and M. Shridhar, 'Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov models,' Pattern Recognition Letters, vol. 22, PP. 209-214, 2001   DOI   ScienceOn
5 F. Liu, Y. Lee, and L. Lee, 'A Direct-Concatenation Approach to Train Hidden Markov Models to Recognize the Highly Confusing Mandarin Syllables with Very Limited Training Data,' IEEE Trans. on Speech and Audio Processing, vol. 1 no. 1, PP. 113-119, 1993   DOI
6 M. Inoue and N. Ueda, 'Exploitation of Unlabeled Sequences in Hidden Markov Models,' IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 25, no. 2, PP. 1570-1581, 2003   DOI   ScienceOn
7 K. H. Davis, R. Biddulph, and S. Balashek, 'Automatic Recognition of Spoken Digits,' J. Acoust. Soc. Am., vol. 24, no. 6, pp. 637-642, 1952   DOI
8 L. R. Bahl, F. Jelinek, and R. L. Mercer, 'A Maximum Likelihood Approach to Continuous Speech recognition,' IEEE Trans. on Pattern Anal. and Machine Intell., vol. PAMI-5, pp. 179-190, 1983   DOI   ScienceOn
9 F. Wessel and H. Ney, 'Unsupervised Training of Acoustic Models for Large Vocabulary Continuous Speech recognition,' IEEE Trans. on Speech and Audio Processing, vol. 13, no. 1, PP. 23-31, 2005   DOI   ScienceOn
10 L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, PP. 485-486, 1993
11 S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, 'An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition,' Bell Systems Tech. J., vol. 62, no. 4, PP. 1035-1074, 1983
12 S. Sivadas and H. Hermansk, 'On Use of Task Independent Training Data In Tandem Feature Extraction,' ICASSP 2004, vol. 1, PP. 541-544, 2004   DOI
13 M. Demirekler, F. Karahan, and T. Ciloglu. 'Fusing Length and Voicing Information, and HMM Decision Using a Bayesian Causal Tree Against Insufficient Training Data.' Proc. 15th International Conference on Pattern Recognition, vol. 3, PP. 102-105, 2000   DOI
14 G. Kaplan. 'Words Into Action I,' IEEE Spectrum, vol. 17, PP. 22-26, 1980
15 P. Kenny, G. Boulianne, and P. Dumouchel, 'Eigenvoice Modeling with Sparse Training Data,' IEEE Trans. on Speech and Audio Processing, vol. 13, no. 3, PP. 345-354, 2005   DOI   ScienceOn