Intra-and Inter-frame Features for Automatic Speech Recognition

Lee, Sung Joo;Kang, Byung Ok;Chung, Hoon;Lee, Yunkeun;

doi:10.4218/etrij.14.0213.0181

ETRI Journal

Volume 36 Issue 3
/
Pages.514-517
/
2014
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Intra-and Inter-frame Features for Automatic Speech Recognition

Lee, Sung Joo (SW.Content Research Laboratory, ETRI) ;
Kang, Byung Ok (SW.Content Research Laboratory, ETRI) ;
Chung, Hoon (SW.Content Research Laboratory, ETRI) ;
Lee, Yunkeun (SW.Content Research Laboratory, ETRI)

Received : 2013.04.16
Accepted : 2013.11.05
Published : 2014.06.01

https://doi.org/10.4218/etrij.14.0213.0181 Citation PDF KSCI KPUBS

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, alternative dynamic features for speech recognition are proposed. The goal of this work is to improve speech recognition accuracy by deriving the representation of distinctive dynamic characteristics from a speech spectrum. This work was inspired by two temporal dynamics of a speech signal. One is the highly non-stationary nature of speech, and the other is the inter-frame change of a speech spectrum. We adopt the use of a sub-frame spectrum analyzer to capture very rapid spectral changes within a speech analysis frame. In addition, we attempt to measure spectral fluctuations of a more complex manner as opposed to traditional dynamic features such as delta or double-delta. To evaluate the proposed features, speech recognition tests over smartphone environments were conducted. The experimental results show that the feature streams simply combined with the proposed features are effective for an improvement in the recognition accuracy of a hidden Markov model-based speech recognizer.

Keywords

References

S. Furui, "Speaker-Independent Isolated Word Recognition Using Dynamic Features of Speech Spectrum," IEEE Trans. Acoust., Speech Signal Process., vol. 34, no. 1, Feb. 1986, pp. 52-59. https://doi.org/10.1109/TASSP.1986.1164788
S. Davis and P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Trans. Acoust. Speech Signal Process., vol. 28, no. 4, Aug. 1980, pp. 357-366. https://doi.org/10.1109/TASSP.1980.1163420
H. Hermansky, "Perceptual Linear Prediction (PLP) Analysis of Speech," J. Acoust. Soc. America, vol. 87, no. 4, Apr. 1990, pp. 1738-1752. https://doi.org/10.1121/1.399423
W.H. Abdulla, "Auditory Based Feature Vectors for Speech Recognition Systems," Advances in Communications And Software Technologies, WSEAS ed., Athens, Greece: WSEAS Press, 2002, pp. 231-236.
D.-S. Kim, S.-Y. Lee, and R.M. Kil, "Auditory Processing of Speech Signals for Robust Speech Recognition in Real-World Noisy Environments," IEEE Trans. Speech Audio Process., vol. 7, no. 1, Jan. 1999, pp. 55-69. https://doi.org/10.1109/89.736331
S. Young et al., The HTK Book (for HTK version 3.4), Cambridge, England: Cambridge University Engineering Department, 2006.
B. Milner, "A Comparison of Front-End Configurations for Robust Speech Recognition," Proc. ICASSP, Orlando, FL, USA, vol. 1, May 13-17, 2002, pp. 797-800.
S.J. Lee et al., "Statistical Model-Based Noise Reduction Approach for Car Interior Applications to Speech Recognition," ETRI J., vol. 32, no. 5, Oct. 2010, pp. 801-809. https://doi.org/10.4218/etrij.10.1510.0024

Cited by

Multilingual speech-to-speech translation system for mobile consumer devices vol.60, pp.3, 2014, https://doi.org/10.1109/tce.2014.6937337
Weighted Finite State Transducer-Based Endpoint Detection Using Probabilistic Decision Logic vol.36, pp.5, 2014, https://doi.org/10.4218/etrij.14.2214.0030
Speech Enhancement Using Phase-Dependent A Priori SNR Estimator in Log-Mel Spectral Domain vol.36, pp.5, 2014, https://doi.org/10.4218/etrij.14.2214.0039
Online Blind Channel Normalization Using BPF-Based Modulation Frequency Filtering vol.38, pp.6, 2014, https://doi.org/10.4218/etrij.16.0115.0994
원어민 및 외국인 화자의 음성인식을 위한 심층 신경망 기반 음향모델링 vol.9, pp.2, 2017, https://doi.org/10.13064/ksss.2017.9.2.095
Multimodal Unsupervised Speech Translation for Recognizing and Evaluating Second Language Speech vol.11, pp.6, 2014, https://doi.org/10.3390/app11062642

ETRI Journal

Intra-and Inter-frame Features for Automatic Speech Recognition

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)