[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7776/ASK.2014.33.4.268

Speech Recognition Using MSVQ/TDRNN MSVQ/TDRNN을 이용한 음성인식

Kim, Sung-Suk (용인대학교 컴퓨터과학과)

Publication Information

The Journal of the Acoustical Society of Korea / v.33, no.4, 2014 , pp. 268-272 More about this Journal

Abstract

This paper presents a method for speech recognition using multi-section vector-quantization (MSVQ) and time-delay recurrent neural network (TDTNN). The MSVQ generates the codebook with normalized uniform sections of voice signal, and the TDRNN performs the speech recognition using the MSVQ codebook. The TDRNN is a time-delay recurrent neural network classifier with two different representations of dynamic context: the time-delayed input nodes represent local dynamic context, while the recursive nodes are able to represent long-term dynamic context of voice signal. The cepstral PLP coefficients were used as speech features. In the speech recognition experiments, the MSVQ/TDRNN speech recognizer shows 97.9 % word recognition rate for speaker independent recognition.

Keywords

Speech recognition; MSVQ; TDRNN;

Citations & Related Records

Reference

1	Y. Linde, A. Buzo, and R. M. Gray, "An algorithm for vector quantizer design," IEEE Trans. on Communication 28, 84-95 (1980). DOI
2	D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, "Learning representations by backpropagating errors," in Parallel Distributed Processing 1 (MIT Press, Cambridge, 1986), pp. 318-362.
3	X. D. Huang, Y. Ariki, and M. A. Jack, Hidden Markov Models for Speech Recognition (Edinburgh University Press, Edinburgh, 1990).
4	K. Lippmann, "Reviews of neural networks for speech recognition," Neural Computation 1, 1-38(1989). DOI
5	H. Bourlard and N. Morgan, Connectionist Speech Recognition - A Hybrid Approach (Kluwer. Amsterdam, 1994), pp. 185-200.
6	A. Waibel, H. Sawai, and K. Shikano, "Modularity and scaling in large phoneme neural networks," IEEE Trans. ASSP. 37, 1188-1197 (1989).
7	T. Robinson, "An application of recurrent nets to phone probability estimation," IEEE Trans. Neural Networks 5, 298-305 (1994). DOI ScienceOn
8	S. S. Kim, "Time-delay recurrent neural network for temporal correlations and prediction," Neurocomputing 20, 253-263 (1998). DOI ScienceOn
9	S. S. Kim, M. Hasegawa-Johnson, and K. Chen, "Automatic recognition of pitch movements using multi-layer prceptron and time-delay recursive neural network," IEEE Signal Process. Lett. 11, 645-648(2004). DOI ScienceOn
10	H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Am. 87, 1738-52 (1990). DOI
11	Z. Rong, C. Zhaoxiong, and H. Heyan, "An improved multisection vector quantization model with application to Chinese digits recognition," Proc. of ICSP 1, 749-752(1996).