Browse > Article
http://dx.doi.org/10.13067/JKIECS.2012.7.3.475

A Study on the Redundancy Reduction in Speech Recognition  

Lee, Chang-Young (동서대학교 시스템경영공학과)
Publication Information
The Journal of the Korea institute of electronic communication sciences / v.7, no.3, 2012 , pp. 475-483 More about this Journal
Abstract
The characteristic features of speech signal do not vary significantly from frame to frame. Therefore, it is advisable to reduce the redundancy involved in the similar feature vectors. The objective of this paper is to search for the optimal condition of minimum redundancy and maximum relevancy of the speech feature vectors in speech recognition. For this purpose, we realize redundancy reduction by way of a vigilance parameter and investigate the resultant effect on the speaker-independent speech recognition of isolated words by using FVQ/HMM. Experimental results showed that the number of feature vectors might be reduced by 30% without deteriorating the speech recognition accuracy.
Keywords
Speech Recognition; Redundancy Reduction; MFCC; Hidden Markov Model;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 S. Alizadeh, R. Boostani, & V. Asadpour, "Lip feature extraction and reduction for HMM-based visual speech recognition systems", 9th International Conference on Signal Processing (ICSP), pp. 561-564. 2008.
2 V. Estellers, M. Gurban, & J. P. Thiran, "Selecting relevant visual features for speechreading", IEEE International Conference on Image Processing (ICIP), pp. 1433 - 1436. 2009.
3 Z. Tan, P. Dalsgaard, & B. Lindberg, "Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End", IEEE 7th Workshop on Multimedia Signal Processing, pp. 1-4. 2005.
4 V. Sanchez, A. M. Peinado, J. L. Perez-Cordoba, "Low complexity channel error mitigation for distributed speech recognition over wireless channels", EEE International Conference on Communications, Vol. 5, pp. 3619-3623. 2003.
5 S. M. Lajevardi & Z. M. Hussain, "Contourlet structural similarity for facial expression recognition", IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 1118-1121. 2010.
6 T. Kim, H. Kim, W. Hwang, S. Kee, & J. Kittler, "Independent component analysis in a facial local residue space", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 579-586. 2003.
7 S. van Vuuren, "Comparison of text-independent speaker recognition methods on telephone speech with acoustic mismatch", Fourth International Conference on Spoken Language, Vol. 3, pp. 1788-1791. 1996.
8 C. Jung, M. Kim, & H. Kang, "Normalized minimum-redundancy and maximum-relevancy based feature selection for speaker verification systems", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4549 - 4552. 2009.
9 L. Granai, T. Vlachos, M. Hamouz, J. R. Tena, & T. Davies, "Model-Based Coding of 3D Head Sequences", 3DTV Conference, pp. 1-4. 2007.
10 T. S. Tabatabaei & S. Krishnan, "Towards robust speech-based emotion recognition", IEEE International Conference on Systems Man and Cybernetics (SMC), pp. 608-611. 2010.
11 L. Xu, M. Xu, & D. Yang, "Factor Analysis and Majority Voting Based Speech Emotion Recognition", International Conference on Intelligent System Design and Engineering Application (ISDEA), Vol. 1, pp. 716-720. 2010.
12 M. D. Emmerson, & R. I. Damper, "Relations between fault tolerance and internal representations for multi-layer perceptrons", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 2, pp. 281-284. 1992.
13 최재승, "신경회로망에 의한 음성 및 잡음 인식 시스템", 한국전자통신학회논문지, 5권, 4호, pp. 357-362, 2010.
14 P. Nguyen, L. Rigazio, C. Wellekens, & J.-C. Junqua, "Construction of model-space constraints", IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 69-72, 2001.
15 J. Weng & X. Jia, "A Memory-Efficient Graph Structured Composite-State Network for Embedded Speech Recognition", Fifth International Conference on Natural Computation (ICNC), Vol. 3, pp. 570-573. 2009.
16 M. Bouallegue, D. Matrouf, & G. Linares, "A simplified Subspace Gaussian Mixture to compact acoustic models for speech recognition", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4896-4899. 2011.
17 P. Min & S. Yihe, "ASIC design of Gabor transform for speech processing", 4th International Conference on ASIC, pp. 401-404. 2001.
18 Y. D. Liu, Y. C. Lee, H. H. Chen, & G. Z. Sun, "Nonlinear resampling transformation for automatic speech recognition", Neural Networks for Signal Processing, pp. 319-326. 1991.
19 G. Sarkar & G. Saha, "Efficient prequantization techniques based on probability density for speaker recognition system", IEEE Region 10 Conference (TENCON), pp. 1-6. 2009.
20 H. Hsieh, J. Chien. K. Shinoda, & S. Furui, "Independent component analysis for noisy speech recognition", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4369-4372. 2009.
21 T. Lee & G. Jang, " The statistical structures of male and female speech signals", International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 1, pp. 105-108. 2001.
22 X. Zhao, P. Yang, & L. Zhang, "Research on the low rate representations for speech signals", 11th IEEE Singapore International Conference on Communication Systems (ICCS), pp. 188-192. 2008.
23 Y. Yangrui, Y. Hongzhi, & L. Yonghong, "The Design of Continuous Speech Corpus Based on Half-Syllable Tibetan", International Conference on Computational Intelligence and Software Engineering, pp. 1-4. 2009.
24 D. Dimitriadis, P. Maragos, & A. Potamianos, "On the Effects of Filterbank Design and Energy Computation on Robust Speech Recognition", IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, No. 6, pp. 1504-1516. 2011.   DOI
25 C. Jung, M. Kim, & H. Kang, "Selecting Feature Frames for Automatic Speaker Recognition Using Mutual Information", IEEE Transactions on Audio, Speech, and Language Processing, Vol. 18, No. 6, pp. 1332-1340. 2010.   DOI
26 J. Song, M. Lyu, J. Hwang, & M. Cai, "PVCAIS: a personal videoconference archive indexing system", International Conference on Multimedia and Expo (ICME), Vol. 2, pp. 673-676. 2003.
27 S. Sadjadi & J. Hansen, "Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5448-5451. 2011.
28 L. Fausett, "Fundamentals of Neural Networks", Prentice-Hall, New Jersey, p. 298. 1994.
29 M. Dehghan, K. Faez, M. Ahmadi, & M. Shridhar, "Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov models," Pattern Recognition Letters, Vol. 22, pp. 209-214. 2001.   DOI
30 T. Drugman, M. Gurban, & J.-P. Thiran, "Relevant Feature Selection for Audio-Visual Speech Recognition", IEEE 9th Workshop on Multimedia Signal Processing (MMSP), pp. 179-182. 2007.
31 W. Sun, Z. Wu, H. Hu, & Y. Zeng, "Multi-band maximum a posteriori multi-transformation algorithm based on the discriminative combination", International Conference on Machine Learning and Cybernetics, Vol. 8, pp. 4876-4880. 2005.
32 Y. Chang, S. Hung, N. Wang, & B. Lin, "CSR: A Cloud-assisted speech recognition service for personal mobile device", International Conference on Parallel Processing (ICPP), pp. 305-314. 2011.
33 김범준, "와이브로 네트워크를 통한 음성 서비스의 측정 기반 품질 기준 수립", 한국전자통신 학회논문지, 6권, 6호, pp. 823-829, 2011.
34 Spiro, G. Taylor, G. Williams, & C. Bregler, "Hands by hand: Crowd-sourced motion tracking for gesture annotation", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 17-24. 2010.
35 H. R. Tohidypour, S. A. Seyyedsalehi, H. Roshandel, & H. Behbood, "Speech recognition using three channel redundant wavelet filterbank", 2nd International Conference on Industrial Mechatronics and Automation (ICIMA), Vol. 2, pp. 325 - 328. 2010.
36 M. Paulik & A. Waibel, "Spoken language translation from parallel speech audio: Simultaneous interpretation as SLT training data", IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 5210-5213. 2010.
37 D. B. Pisoni, H. C. Nusbaum, & B. G. Greene, "Perception of synthetic speech generated by rule", Proceedings of the IEEE, Vol. 73, No. 11, pp. 1665-1676. 1985.   DOI