Browse > Article
http://dx.doi.org/10.13067/JKIECS.2015.10.9.993

Utilization of Phase Information for Speech Recognition  

Lee, Chang-Young (Div. of Mechatronics Engineering, Dongseo University)
Publication Information
The Journal of the Korea institute of electronic communication sciences / v.10, no.9, 2015 , pp. 993-1000 More about this Journal
Abstract
Mel-Frequency Cepstral Coefficients(: MFCC) is one of the noble feature vectors for speech signal processing. An evident drawback in MFCC is that the phase information is lost by taking the magnitude of the Fourier transform. In this paper, we consider a method of utilizing the phase information by treating the magnitudes of real and imaginary components of FFT separately. By applying this method to speech recognition with FVQ/HMM, the speech recognition error rate is found to decrease compared to the conventional MFCC. By numerical analysis, we show also that the optimal value of MFCC components is 12 which come from 6 real and imaginary components of FFT each.
Keywords
Complex Cpstrum; Phase Information; MFCC; Speech Recognition;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 G. Kaplan, "Words into action: I," IEEE Spectrum, vol. 17, 1980, pp. 22-26.
2 Y. Chang, S. Hung, N. Wang, and B. Lin, "CSR: A Cloud-assisted speech recognition service for personal mobile device," Int. Conf. on Parallel Processing, Taipei, Taiwan, Sep. 2011, pp. 305-314.
3 M. Kang, "A Study on the Design of Multimedia Service Platform on Wireless Intelligent Technology," J. of the Korea Institute of Electronic Communication Sciences, vol. 4, no. 1, 2009, pp. 24-30.
4 J. Yoo, H. Park, H. Shin, and Y. Shin, "A Study of the Communication Infrastructure Construction for u-City in Korea," J. of the Korea Institute of Electronic Communication Sciences, vol. 1, no. 2, 2006, pp. 127-135.
5 B. Kim, "Service Quality Criteria for Voice Services over a WiBro Network," J. of the Korea Institute of Electronic Communication Sciences, vol. 6, no. 6, 2011, pp. 823-829.
6 J. W. Picone, "Signal modeling techniques in speech recognition," Proc. IEEE, vol. 81, no. 9, 1993, pp. 1215-1247.   DOI
7 B. Bozkurt and L. Couvreur, "On the use of phase information for speech recognition," In Proc. of Eusipco, Antalya, Turkey, 2005, pp. 1-4.
8 K. K. Paliwal, "Usefulness of phase in speech processing", Proc. IPSJ Spoken Language Processing Workshop, Gifu, Japan, Feb. 2003, pp. 1-6.
9 J. C. Wang, J. F. Wang, and Y. Weng, "Chip design of MFCC extraction for speech recognition," The VLSI Journal, vol. 32, 2002, pp. 111-131.   DOI
10 J. M. Bioucas-Dias and G. Valadao, "Phase Unwrapping via Graph Cuts," IEEE Trans. on Image Processing, vol. 16 no. 3, 2007, pp. 698-709.   DOI
11 T. Drugman, B. Bozkurt, and T. Dutoit, "Complex Cepstrum-Based Decomposition of Speech for Glottal Source Estimation," Interspeech, Brighton, Sep. 2009, pp. 116-119.
12 L. Fausett, Fundamentals of Neural Networks, New Jersey: Prentice-Hall, 1994.
13 J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete-Time Processing of Speech Signals, New York: Macmillan, 1994.
14 W. Xu, Zhengzhou, Y. Guo, B. Wang and X. Wang, "A Noise Robust Front-End Using Wiener Filter, Probability Model and CMS for ASR," Int. Conf. on Natural Language Processing and Knowledge Engineering, Zhengzhou, China, 2005, pp. 102-105.
15 M. Dehghan, K. Faez, M. Ahmadi, and M. Shridhar, "Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov models," Pattern Recognition Letters, vol. 22, 2001, pp. 209-214.   DOI