[KSCI] Korea Science Citation Index Service

The Effect of the Number of Training Data on Speech Recognition

Lee, Chang-Young (Div. of Information System Engineering, Dongseo University)

Publication Information

The Journal of the Acoustical Society of Korea / v.28, no.2E, 2009 , pp. 66-71 More about this Journal

Abstract

In practical applications of speech recognition, one of the fundamental questions might be on the number of training data that should be provided for a specific task. Though plenty of training data would undoubtedly enhance the system performance, we are then faced with the problem of heavy cost. Therefore, it is of crucial importance to determine the least number of training data that will afford a certain level of accuracy. For this purpose, we investigate the effect of the number of training data on the speaker-independent speech recognition of isolated words by using FVQ/HMM. The result showed that the error rate is roughly inversely proportional to the number of training data and grows linearly with the vocabulary size.

Keywords

Speech recognition; Number of training data; FVQ; HMM;

Citations & Related Records

Reference

1	V. Siivola and A. Honkela, 'A State-Space Method for Language Modelling,' 2003 IEEE Workshop on Automatic Speech recognition and Understanding, PP. 548-553, 2003 DOI
2	N. Jakovljevic and D. Pekar, 'Description of Training Procedure for AlfaNum Continuous Speech Recognition,' EUROCON 2005, PP. 1646-1649, 2005 DOI
3	M. Schaff ner, S. E. Kruger, E. Andelic, M. Katz, and A. Wendemuth, 'Limited Training Data Robust Speech Recognition Using Kernel-Based Acoustic Models.' ICASSP 2006, vol. 1, PP. 1137-1140, 2006 DOI
4	M. Dehghan, K. Faez, M. Ahmadi, and M. Shridhar, 'Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov models,' Pattern Recognition Letters, vol. 22, PP. 209-214, 2001 DOI ScienceOn
5	F. Liu, Y. Lee, and L. Lee, 'A Direct-Concatenation Approach to Train Hidden Markov Models to Recognize the Highly Confusing Mandarin Syllables with Very Limited Training Data,' IEEE Trans. on Speech and Audio Processing, vol. 1 no. 1, PP. 113-119, 1993 DOI
6	M. Inoue and N. Ueda, 'Exploitation of Unlabeled Sequences in Hidden Markov Models,' IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 25, no. 2, PP. 1570-1581, 2003 DOI ScienceOn
7	K. H. Davis, R. Biddulph, and S. Balashek, 'Automatic Recognition of Spoken Digits,' J. Acoust. Soc. Am., vol. 24, no. 6, pp. 637-642, 1952 DOI
8	L. R. Bahl, F. Jelinek, and R. L. Mercer, 'A Maximum Likelihood Approach to Continuous Speech recognition,' IEEE Trans. on Pattern Anal. and Machine Intell., vol. PAMI-5, pp. 179-190, 1983 DOI ScienceOn
9	F. Wessel and H. Ney, 'Unsupervised Training of Acoustic Models for Large Vocabulary Continuous Speech recognition,' IEEE Trans. on Speech and Audio Processing, vol. 13, no. 1, PP. 23-31, 2005 DOI ScienceOn
10	L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, PP. 485-486, 1993
11	S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, 'An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition,' Bell Systems Tech. J., vol. 62, no. 4, PP. 1035-1074, 1983
12	S. Sivadas and H. Hermansk, 'On Use of Task Independent Training Data In Tandem Feature Extraction,' ICASSP 2004, vol. 1, PP. 541-544, 2004 DOI
13	M. Demirekler, F. Karahan, and T. Ciloglu. 'Fusing Length and Voicing Information, and HMM Decision Using a Bayesian Causal Tree Against Insufficient Training Data.' Proc. 15th International Conference on Pattern Recognition, vol. 3, PP. 102-105, 2000 DOI
14	G. Kaplan. 'Words Into Action I,' IEEE Spectrum, vol. 17, PP. 22-26, 1980
15	P. Kenny, G. Boulianne, and P. Dumouchel, 'Eigenvoice Modeling with Sparse Training Data,' IEEE Trans. on Speech and Audio Processing, vol. 13, no. 3, PP. 345-354, 2005 DOI ScienceOn