Implementation of HMM Based Speech Recognizer with Medium Vocabulary Size Using TMS320C6201 DSP

TMS320C6201 DSP를 이용한 HMM 기반의 음성인식기 구현

  • Jung, Sung-Yun (Telecom Examination Div., The Korean Intellectual Property Office) ;
  • Son, Jong-Mok (Application Technology Research Department, National Security Research Institute) ;
  • Bae, Keun-Sung (School of Electronic and Electrical Engineering, Kyungpook National University)
  • Published : 2006.03.31

Abstract

In this paper, we focused on the real time implementation of a speech recognition system with medium size of vocabulary considering its application to a mobile phone. First, we developed the PC based variable vocabulary word recognizer having the size of program memory and total acoustic models as small as possible. To reduce the memory size of acoustic models, linear discriminant analysis and phonetic tied mixture were applied in the feature selection process and training HMMs, respectively. In addition, state based Gaussian selection method with the real time cepstral normalization was used for reduction of computational load and robust recognition. Then, we verified the real-time operation of the implemented recognition system on the TMS320C6201 EVM board. The implemented recognition system uses memory size of about 610 kbytes including both program memory and data memory. The recognition rate was 95.86% for ETRI 445DB, and 96.4%, 97.92%, 87.04% for three kinds of name databases collected through the mobile phones.

Keywords

References

  1. R. Haeb-Umbach, H,Ney, 'Linear Discriminant Analysis for Improved Large Vocabulary Continuous Speech Recognition,' International Conference on Acoustic, Speech and Signal Processing, 1, 13-16, 1992
  2. B.H. Juang, L.R. Rabiner, 'The Segmental K-Means Algorithm for Estimation Parameters of Hidden Markov Models,' IEEE Trans, on Acoustics, Speech, and Signal Processing, 38 (9) 1639-1641, 1990 https://doi.org/10.1109/29.60082
  3. Y. Zhao, 'A Speaker-Independent Continuous Speech Recognition System Using Continuous Mixture Gaussian Density HMM of Phoneme-Sized Units,' IEEE Trans. on Acoustics, Speech and Signal Processing, 1, No, 3, 345-361, 1993 https://doi.org/10.1109/ICASSP.1993.319126
  4. Akinobu Lee, Tatsuja kawahara, Kiyoshiro Shikano, 'A New Phonetic TIED-MIXTURE MODEL For Efficient Decoding,' International Conference on Acoustic, Speech and Signal Processing, 3 (2) 1269-1271, 2000
  5. Mark J.F, Gales, Katherine M. Knill 'State-Based Gaussian Selection in Large Vocabulary Continuous Speech Recognition Using HMM's,' IEEE Trans. on Acoustics, Speech, and Signal Processing, 7 (2) 52-161, 1999
  6. Texas Instrument, TMS320C6000 Programmer's Guide, 2000
  7. Markus Lieb, Reinhold Haeb-Umbach, 'LDA derived Cepstral Trajectory Filters in Adverse Environmental Conditions,' International Conference on Acoustic, Speech and Signal Processing, 2000
  8. Fu-Hua Liu, Richard M. Stern, Xuedong Huang, Alejandro Acero, 'Efficient Cepstral Normalization for Robust Speech Recognition,' Proc. of the Sixth ARPA Workshop on Human Language Technology, 1993
  9. Alejandro Acero, Acoustical and Environmental Robustness in Automatic Speech Recognition, (Ph.D. thesis, Carnegie Mellon University, 1990)