Selecting Good Speech Features for Recognition

  • Published : 1996.04.30

Abstract

This paper describes a method to select a suitable feature for speech recognition using information theoretic measure. Conventional speech recognition systems heuristically choose a portion of frequency components, cepstrum, mel-cepstrum, energy, and their time differences of speech waveforms as their speech features. However, these systems never have good performance if the selected features are not suitable for speech recognition. Since the recognition rate is the only performance measure of speech recognition system, it is hard to judge how suitable the selected feature is. To solve this problem, it is essential to analyze the feature itself, and measure how good the feature itself is. Good speech features should contain all of the class-related information and as small amount of the class-irrelevant variation as possible. In this paper, we suggest a method to measure the class-related information and the amount of the class-irrelevant variation based on the Shannon's information theory. Using this method, we compare the mel-scaled FFT, cepstrum, mel-cepstrum, and wavelet features of the TIMIT speech data. The result shows that, among these features, the mel-scaled FFT is the best feature for speech recognition based on the proposed measure.

Keywords

References

  1. IEEE Trans. Acoust., Speech, Signal Proc. v.ASSP-28 Application of dynamic time warping to connected digit recognition Rabiner, L.R.;Schmidt, C.E.
  2. IEEE Trans. Acoust., Speech, Signal Proc. v.ASSP-29 Connected digit recognition using a level-building DTW algorithm Myers, C.S.;Rabiner, L.R.
  3. IEEE Trans. Acoust., Speech, Signal Proc. v.ASSP-32 The use of a one-stage dynamic programming algorithm for connected word recognition Ney, H.
  4. The Bell SystemTechnical Journal v.62 On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition Rabiner, L.R.;Levinson, S.E.;Sondhi, M.M.
  5. Proc. IEEE v.77 A tutorial on hidden Markov models and selected application in speech recognition Rabiner, L.R.
  6. IEEE Trans. Acoust., Speech, Signal Proc. v.38 no.4 Context-dependent phonetic hidden Markov models for speaker independent continuous speech recognition Lee, K.F.
  7. IEEE ASSP Magazine no.Jul. Continuous speech recognition using hidden Markov models Picone, J.
  8. Proc. ICASSP’94 v.I Janus 93: Towards spontaneous speechtranslation Woszczyna, M.(et al.)
  9. Proc. ICASSP’95 v.1 Reducing word error rate on conversational speech from the switchboard corpus Jeanrenaud, P.(et al.)
  10. IEEE Trans. Acoust., Speech, Signal Proc. v.29 Cepstral analysis technique for automatic speaker verification Furui, S.
  11. IEEE Trans.Acoust., Speech, Signal Proc. v.28 Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences Davis, S.B.;Mermelstein, P.
  12. Proc. ICASSP'95 On the robustness of linear discriminant analysis as a preprocessing step for noisy speech recognition Siiohan, O.
  13. Introduction to Statistical Pattern Recognition Fukunaga, M.
  14. Introduction to the Theory of Neural Computation Hertz, J.A.;Krogh, A.;Palmer, R.G.
  15. Probability, Random Variables, and Stochastic Processes Papoulis, A.
  16. Neural Networks v.2 Minimum class entropy: A maximum information approach to layered networks Bichel, M.;Seitz, P.
  17. Parallel Distributed Processing Rumelhart, D.E.;McClelland, J.L.
  18. Proc. ICASSP'94 v.I Robust methods for using context-dependent features and models in a continuous speech recognizer Bahl, L.R.(et al.)
  19. Comparative study of nonlinear time warping techniques in isolated word speech recognition systems, Tech. Rep. Waibel, A.;Yegnanarayana, B.