Browse > Article
http://dx.doi.org/10.13067/JKIECS.2014.9.11.1221

The Effect of the Number of Phoneme Clusters on Speech Recognition  

Lee, Chang-Young (Div. of Information Systems Engineering, Dongseo University)
Publication Information
The Journal of the Korea institute of electronic communication sciences / v.9, no.11, 2014 , pp. 1221-1226 More about this Journal
Abstract
In an effort to improve the efficiency of the speech recognition, we investigate the effect of the number of phoneme clusters. For this purpose, codebooks of varied number of phoneme clusters are prepared by modified k-means clustering algorithm. The subsequent processing is fuzzy vector quantization (FVQ) and hidden Markov model (HMM) for speech recognition test. The result shows that there are two distinct regimes. For large number of phoneme clusters, the recognition performance is roughly independent of it. For small number of phoneme clusters, however, the recognition error rate increases nonlinearly as it is decreased. From numerical calculation, it is found that this nonlinear regime might be modeled by a power law function. The result also shows that about 166 phoneme clusters would be the optimal number for recognition of 300 isolated words. This amounts to roughly 3 variations per phoneme.
Keywords
speech recognition; number of phoneme clusters; fuzzy vector quantization; hidden Markov model;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Y. Chang, S. Hung, N. Wang, and B. Lin, "CSR: A Cloud-assisted speech recognition service for personal mobile device," Int. Conf. on Parallel Processing, Taipei, Taiwan, Sept. 2011, pp. 305-314.
2 M. Kang, "A Study on the Design of Multimedia Service Platform on Wireless Intelligent Technology," J. of the Korea Institute of Electronic Communication Sciences, vol. 4, no. 1, 2009, pp. 24-30.
3 J. Yoo, H. Park, H. Shin, and Y. Shin, "A Study of the Communication Infrastructure Construction for u-City in Korea," J. of the Korea Institute of Electronic Communication Sciences, vol. 1, no. 2, 2006, pp. 127-135.   과학기술학회마을
4 B. Kim, "Service Quality Criteria for Voice Services over a WiBro Network," J. of the Korea Institute of Electronic Communication Sciences, vol. 6, no. 6, 2011, pp. 823-829.   과학기술학회마을
5 G. Kaplan, "Words Into Action I," IEEE Spectrum, vol. 17, 1980, pp. 22-26.
6 L. Rabiner and B. Juang, Fundamentals of Speech Recognition. Englewood Cliffs, New Jersey : Prentice Hall, 1993.
7 J. Deller, J. Proakis, and J. Hansen, Discrete Time Processing of Speech Signals. New York : Macmillan, 1993, pp. 115-119.
8 L. Fausett, Fundamentals of Neural Networks. Englewood Cliffs, New Jersey : Prentice Hall, 1994.
9 J.-C. Wang, J.-F. Wang, and Y. Weng, "Chip design of MFCC extraction for speech recognition," The VLSI J., vol. 32, 2002, pp. 111-131.   DOI   ScienceOn
10 M. K. Pakhira, "A Modified k-means Algorithm to Avoid Empty Clusters," Int. J. of Recent Trends in Engineering, vol. 1, no. 1, 2009, pp. 220-226.
11 M. Dehghan, K. Faez, M. Ahmadi, and M. Shridhar, "Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov Models," Pattern Recognition Letters, vol. 22, 2001, pp. 209-214.   DOI   ScienceOn
12 S. E. Levinson, L. R. Rabiner, and M. Sondhi, "An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition," Bell Systems Tech. J., vol. 62, no. 4, 1983, pp. 1035-1074.   DOI