The Effect of the Number of Training Data on Speech Recognition

  • Published : 2009.06.30

Abstract

In practical applications of speech recognition, one of the fundamental questions might be on the number of training data that should be provided for a specific task. Though plenty of training data would undoubtedly enhance the system performance, we are then faced with the problem of heavy cost. Therefore, it is of crucial importance to determine the least number of training data that will afford a certain level of accuracy. For this purpose, we investigate the effect of the number of training data on the speaker-independent speech recognition of isolated words by using FVQ/HMM. The result showed that the error rate is roughly inversely proportional to the number of training data and grows linearly with the vocabulary size.

Keywords

References

  1. G. Kaplan. 'Words Into Action I,' IEEE Spectrum, vol. 17, PP. 22-26, 1980
  2. K. H. Davis, R. Biddulph, and S. Balashek, 'Automatic Recognition of Spoken Digits,' J. Acoust. Soc. Am., vol. 24, no. 6, pp. 637-642, 1952 https://doi.org/10.1121/1.1906946
  3. L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, PP. 485-486, 1993
  4. L. R. Bahl, F. Jelinek, and R. L. Mercer, 'A Maximum Likelihood Approach to Continuous Speech recognition,' IEEE Trans. on Pattern Anal. and Machine Intell., vol. PAMI-5, pp. 179-190, 1983 https://doi.org/10.1109/TPAMI.1983.4767370
  5. F. Liu, Y. Lee, and L. Lee, 'A Direct-Concatenation Approach to Train Hidden Markov Models to Recognize the Highly Confusing Mandarin Syllables with Very Limited Training Data,' IEEE Trans. on Speech and Audio Processing, vol. 1 no. 1, PP. 113-119, 1993 https://doi.org/10.1109/89.221375
  6. M. Demirekler, F. Karahan, and T. Ciloglu. 'Fusing Length and Voicing Information, and HMM Decision Using a Bayesian Causal Tree Against Insufficient Training Data.' Proc. 15th International Conference on Pattern Recognition, vol. 3, PP. 102-105, 2000 https://doi.org/10.1109/ICPR.2000.903495
  7. M. Inoue and N. Ueda, 'Exploitation of Unlabeled Sequences in Hidden Markov Models,' IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 25, no. 2, PP. 1570-1581, 2003 https://doi.org/10.1109/TPAMI.2003.1251150
  8. V. Siivola and A. Honkela, 'A State-Space Method for Language Modelling,' 2003 IEEE Workshop on Automatic Speech recognition and Understanding, PP. 548-553, 2003 https://doi.org/10.1109/ASRU.2003.1318499
  9. S. Sivadas and H. Hermansk, 'On Use of Task Independent Training Data In Tandem Feature Extraction,' ICASSP 2004, vol. 1, PP. 541-544, 2004 https://doi.org/10.1109/ICASSP.2004.1326042
  10. F. Wessel and H. Ney, 'Unsupervised Training of Acoustic Models for Large Vocabulary Continuous Speech recognition,' IEEE Trans. on Speech and Audio Processing, vol. 13, no. 1, PP. 23-31, 2005 https://doi.org/10.1109/TSA.2004.838537
  11. P. Kenny, G. Boulianne, and P. Dumouchel, 'Eigenvoice Modeling with Sparse Training Data,' IEEE Trans. on Speech and Audio Processing, vol. 13, no. 3, PP. 345-354, 2005 https://doi.org/10.1109/TSA.2004.840940
  12. N. Jakovljevic and D. Pekar, 'Description of Training Procedure for AlfaNum Continuous Speech Recognition,' EUROCON 2005, PP. 1646-1649, 2005 https://doi.org/10.1109/EURCON.2005.1630286
  13. M. Schaff ner, S. E. Kruger, E. Andelic, M. Katz, and A. Wendemuth, 'Limited Training Data Robust Speech Recognition Using Kernel-Based Acoustic Models.' ICASSP 2006, vol. 1, PP. 1137-1140, 2006 https://doi.org/10.1109/ICASSP.2006.1660226
  14. M. Dehghan, K. Faez, M. Ahmadi, and M. Shridhar, 'Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov models,' Pattern Recognition Letters, vol. 22, PP. 209-214, 2001 https://doi.org/10.1016/S0167-8655(00)00090-8
  15. S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, 'An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition,' Bell Systems Tech. J., vol. 62, no. 4, PP. 1035-1074, 1983