Browse > Article
http://dx.doi.org/10.4218/etrij.13.0112.0074

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems  

Shekofteh, Yasser (Biomedical Engineering Department, Amirkabir University of Technology)
Almasganj, Farshad (Biomedical Engineering Department, Amirkabir University of Technology)
Publication Information
ETRI Journal / v.35, no.1, 2013 , pp. 100-108 More about this Journal
Abstract
In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.
Keywords
Reconstructed phase space; phoneme attractor; feature extraction; speech recognition;
Citations & Related Records

Times Cited By Web Of Science : 0  (Related Records In Web of Science)
연도 인용수 순위
  • Reference
1 A. Errity, J. McKenna, and B. Kirkpatrick, "Dimensionality Reduction Methods Applied to Both Magnitude and Phase Derived Features," Proc. Interspeech, 2007, pp. 1957-1960.
2 X. Liu, Discriminative Complexity Control and Linear Projections for Large Vocabulary Speech Recognition, doctoral dissertation, Cambridge University Engineering Department, Cambridge, England, UK, 2005.
3 Y. Tang and R. Rose, "A Study of Using Locality Preserving Projections for Feature Extraction in Speech Recognition," Proc. ICASSP, 2008, pp. 1569-1572.
4 H. Hermansky, "Perceptual Linear Predictive (PLP) Analysis of Speech," J. Acoustical Soc. America, vol. 87, no. 4, 1990, pp. 1738-1752.   DOI
5 I. Kokkinos and P. Maragos, "Nonlinear Speech Analysis Using Models for Chaotic Systems," IEEE Trans. Speech Audio Process., vol. 13, no. 6, 2005, pp. 1098-1109.   DOI   ScienceOn
6 J.J. Jiang, Y. Zhang, and C. McGilligan, "Chaos in Voice, from Modeling to Measurement," J. Voice, vol. 20, 2006, pp. 2-17.   DOI   ScienceOn
7 H. Whitney, "Differentiable Manifolds," Annals Math., 2nd series, vol. 37, 1936, pp. 645-680.   DOI   ScienceOn
8 F. Takens, "Detecting Strange Attractors in Turbulence," Proc. Dynamical Syst. Turbulence, 1980, pp. 366-381.
9 H. Kantz and T. Schreiber, Nonlinear Time Series Analysis, Cambridge, England, UK: Cambridge University Press, 1997.
10 A. Ezeiza et al., "Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition," Proc. NOLISP, 2011, pp. 183-189.
11 V. Pitsikalis, I. Kokkinos, and P. Maragos, "Nonlinear Analysis of Speech Signals: Generalized Dimensions and Lyapunov Exponents," Proc. Eurospeech, 2003.
12 S. Prasad et al., "Nonlinear Dynamical Invariants for Speech Recognition," Proc. Int. Conf. Spoken Language Process., 2006, pp. 2518-2521.
13 S. Yu, D. Zheng, and X. Feng, "A New Time-Domain Feature Parameter for Phoneme Classification," Proc. WESPAC IX, 2006.
14 J. Sun, N. Zheng, and X. Wang, "Enhancement of Chinese Speech Based on Nonlinear Dynamics," Signal Process., vol. 87, no. 1, 2007, pp. 2431-2445.   DOI   ScienceOn
15 M.T. Johnson et al., "Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces," IEEE Trans. Speech Audio Process., vol. 13, no. 4, 2005, pp. 458-466.   DOI   ScienceOn
16 A.C. Lindgren, M.T. Johnson, and R.J. Povinelli, "Speech Recognition Using Reconstructed Phase Space Features," Proc. IEEE Int. Conf. Acoustics Speech Signal Process., 2003, pp. 61-63.
17 R.J. Povinelli et al., "Statistical Models of Reconstructed Phase Spaces for Signal Classification," IEEE Trans. Signal Process., vol. 54, no. 6, 2006, pp. 2178-2186.   DOI   ScienceOn
18 A. Jafari, F. Almasganj, and M. NabiBidhendi, "Statistical Modeling of Speech Poincaré Sections in Combination of Frequency Analysis to Improve Speech Recognition Performance," Chaos, vol. 20, 2010, pp. 033106:1-11.
19 Y. Shekofteh and F. Almasganj, "Using Phase Space Based Processing to Extract Proper Features for ASR Systems," Proc. 5th Int. Symp. Telecommun., 2010, pp. 596-599.
20 A.C. Lindgren, M.T. Johnson, and R.J. Povinelli, "Joint Frequency Domain and Reconstructed Phase Space Features for Speech Recognition," Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., 2004, pp. 533-536.
21 J. Ye, M.T. Johnson, and R.J. Povinelli, "Phoneme Classification over Reconstructed Phase Space Using Principal Component Analysis," Proc. NOLISP, 2003, pp. 11-16.
22 FARSDAT (Farsi Speech Database). Available: http://catalog. elra.info/product_info.php?products_id=18
23 S. Young et al., The HTK Book, Version 3.4, Cambridge University Engineering Department, Cambridge, England, UK, 2006. Available: http://htk.eng.cam.ac.uk
24 Y. Shekofteh, F. Almasganj, and M.M. Goodarzi, "Comparison of Linear Based Feature Transformations to Improve Speech Recognition Performance," Proc. ICEE, 2011, pp. 1-4.
25 C.C. Chang and C.J. Lin, "LIBSVM: A Library for Support Vector Machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, Apr. 2011, article 27.
26 C.W. Hsu and C.J. Lin, "A Comparison of Methods for Multiclass Support Vector Machines," IEEE Trans. Neural Netw., vol. 13, no. 2, 2002, pp. 415-425.   DOI   ScienceOn
27 F. Grezl and M. Karafiat, "Integrating Recent MLP Feature Extraction Techniques into TRAP Architecture," Proc. Interspeech, 2011, pp. 1229-1232.