Browse > Article
http://dx.doi.org/10.7776/ASK.2014.33.4.268

Speech Recognition Using MSVQ/TDRNN  

Kim, Sung-Suk (용인대학교 컴퓨터과학과)
Abstract
This paper presents a method for speech recognition using multi-section vector-quantization (MSVQ) and time-delay recurrent neural network (TDTNN). The MSVQ generates the codebook with normalized uniform sections of voice signal, and the TDRNN performs the speech recognition using the MSVQ codebook. The TDRNN is a time-delay recurrent neural network classifier with two different representations of dynamic context: the time-delayed input nodes represent local dynamic context, while the recursive nodes are able to represent long-term dynamic context of voice signal. The cepstral PLP coefficients were used as speech features. In the speech recognition experiments, the MSVQ/TDRNN speech recognizer shows 97.9 % word recognition rate for speaker independent recognition.
Keywords
Speech recognition; MSVQ; TDRNN;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Y. Linde, A. Buzo, and R. M. Gray, "An algorithm for vector quantizer design," IEEE Trans. on Communication 28, 84-95 (1980).   DOI
2 D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, "Learning representations by backpropagating errors," in Parallel Distributed Processing 1 (MIT Press, Cambridge, 1986), pp. 318-362.
3 X. D. Huang, Y. Ariki, and M. A. Jack, Hidden Markov Models for Speech Recognition (Edinburgh University Press, Edinburgh, 1990).
4 K. Lippmann, "Reviews of neural networks for speech recognition," Neural Computation 1, 1-38(1989).   DOI
5 H. Bourlard and N. Morgan, Connectionist Speech Recognition - A Hybrid Approach (Kluwer. Amsterdam, 1994), pp. 185-200.
6 A. Waibel, H. Sawai, and K. Shikano, "Modularity and scaling in large phoneme neural networks," IEEE Trans. ASSP. 37, 1188-1197 (1989).
7 T. Robinson, "An application of recurrent nets to phone probability estimation," IEEE Trans. Neural Networks 5, 298-305 (1994).   DOI   ScienceOn
8 S. S. Kim, "Time-delay recurrent neural network for temporal correlations and prediction," Neurocomputing 20, 253-263 (1998).   DOI   ScienceOn
9 S. S. Kim, M. Hasegawa-Johnson, and K. Chen, "Automatic recognition of pitch movements using multi-layer prceptron and time-delay recursive neural network," IEEE Signal Process. Lett. 11, 645-648(2004).   DOI   ScienceOn
10 H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Am. 87, 1738-52 (1990).   DOI
11 Z. Rong, C. Zhaoxiong, and H. Heyan, "An improved multisection vector quantization model with application to Chinese digits recognition," Proc. of ICSP 1, 749-752(1996).