Browse > Article

The Effect of FIR Filtering and Spectral Tilt on Speech Recognition with MFCC  

Lee, Chang-Young (동서대학교 정보시스템공학부)
Publication Information
The Journal of the Korea institute of electronic communication sciences / v.5, no.4, 2010 , pp. 363-371 More about this Journal
Abstract
In an effort to enhance the quality of feature vector classification and thereby reduce the recognition error rate for the speaker-independent speech recognition, we study the effect of spectral tilt on the Fourier magnitude spectrum en route to the extraction of MFCC. The effect of FIR filtering on the speech signal on the speech recognition is also investigated in parallel. Evaluation of the proposed methods are performed by two independent ways of the Fisher discriminant objective function and speech recognition test by hidden Markov model with fuzzy vector quantization. From the experiments, the recognition error rate is found to show about 10% relative improvements over the conventional method by an appropriate choice of the tilt factor.
Keywords
Speech Recognition; Spectral Tilt; MFCC; HMM; FVQ;
Citations & Related Records
연도 인용수 순위
  • Reference
1 G. Kaplan, "Words Into Action I," IEEE Spectrum, vol. 17, pp. 22-26, 1980.
2 K. H. Davis, R. Biddulph, & S. Balashek, "Automatic Recognition of Spoken Digits," J. Acoust. Soc. Am., vol. 24, no. 6, pp. 637-642, 1952.   DOI
3 J. W. Picone, "Signal Modeling Techniques in Speech Recognition." Proc. IEEE, vol. 81, no. 9, pp. 1215-1247, 1993.   DOI   ScienceOn
4 J.-C. Wang, J.-F. Wang, & Y. Weng, "Chip Design of MFCC Extraction For Speech Recognition." The VLSI Journal, vol. 32, pp. 111-131, 2002.   DOI   ScienceOn
5 E. Zwicker & E. Terhardt, "Analytical Expressions for Critical Band Rate and Critical Bandwidth As a Function of Frequency." J. Acoust. Soc. America, vol. 68, no. 5, pp. 1523-1525, 1980.   DOI   ScienceOn
6 W. Han, C. Chan, C. Choy, & K. Pun, "An Efficient MFCC Extraction Method in Speech Recognition." 2006 IEEE International Symposium on Circuits and Systems, pp. 145-148, 2006.
7 Wikipedia Encyclopedia on Pre-emphasis.
8 R. A. Fisher, "The Use of Multiple Measurements in Taxonomic Problems." Annals of Eugenics, vol. 7, pp. 179-188, 1936.   DOI
9 L. Rabiner and B. Juang, "Fundamentals of Speech Recognition," Prentice Hall, New Jersey, pp. 112-113, 1993.
10 J. Hung, "Optimization of Filter-Bank to Improve the Extraction of MFCC Features in Speech Recognition", Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 675-678, 2004.
11 A. Martin, D. Charlet, & A. Mauuary, "Robust Speech/Non-Speech Detection Using LDA Applied to MFCC", 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing," vol. 1, pp. 237-240, 2001.
12 R. Hecht-Nielsen, "Neurocomputing," Reading, Massachusetts, Addison-Wesley, 1990.
13 M. Dehghan, K. Faez, M. Ahmadi, & M. Shridhar, "Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov Models," Pattern Recognition Letters, vol. 22, pp. 209-214, 2001.   DOI   ScienceOn
14 S. E. Levinson, L. R. Rabiner, & M. M. Sondhi, "An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition," Bell Systems Tech. J., vol. 62, no. 4, pp. 1035-1074, 1983.   DOI