Voice Recognition Performance Improvement using the Convergence of Voice signal Feature and Silence Feature Normalization in Cepstrum Feature Distribution

Hwang, Jae-Cheon;

doi:10.15207/JKCS.2017.8.5.013

Journal of the Korea Convergence Society (한국융합학회논문지)

Volume 8 Issue 5
/
Pages.13-17
/
2017
/
2233-4890(pISSN)
/
2713-6353(eISSN)

Korea Convergence Society (한국융합학회)

DOI QR Code

Voice Recognition Performance Improvement using the Convergence of Voice signal Feature and Silence Feature Normalization in Cepstrum Feature Distribution

음성 신호 특징과 셉스트럽 특징 분포에서 묵음 특징 정규화를 융합한 음성 인식 성능 향상

Hwang, Jae-Cheon (Department of Computer Engineering, Gachon University)

황재천 (가천대학교 컴퓨터공학과)

Published : 2017.05.28

https://doi.org/10.15207/JKCS.2017.8.5.013 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Existing Speech feature extracting method in speech Signal, there are incorrect recognition rates due to incorrect speech which is not clear threshold value. In this article, the modeling method for improving speech recognition performance that combines the feature extraction for speech and silence characteristics normalized to the non-speech. The proposed method is minimized the noise affect, and speech recognition model are convergence of speech signal feature extraction to each speech frame and the silence feature normalization. Also, this method create the original speech signal with energy spectrum similar to entropy, therefore speech noise effects are to receive less of the noise. the performance values are improved in signal to noise ration by the silence feature normalization. We fixed speech and non speech classification standard value in cepstrum For th Performance analysis of the method presented in this paper is showed by comparing the results with CHMM HMM, the recognition rate was improved 2.7%p in the speech dependent and advanced 0.7%p in the speech independent.

음성 인식에서 기존의 음성 특징 추출 방법은 명확하지 않은 스레숄드 값으로 인해 부정확한 음성 인식률을 가진다. 본 연구에서는 음성과 비음성에 대한 특징 추출을 묵음 특징 정규화를 융합한 음성 인식 성능 향상을 위한 방법을 모델링 한다. 제안한 방법에서는 잡음의 영향을 최소화하여 모델을 구성하였고, 각 음성 프레임에 대해 음성 신호 특징을 추출하여 음성 인식 모델을 구성하였고, 이를 묵음 특징 정규화를 융합하여 에너지 스펙트럼을 엔트로피와 유사하게 표현하여 원래의 음성 신호를 생성하고 음성의 특징이 잡음을 적게 받도록 하였다. 셉스트럼에서 음성과 비음성 분류의 기준 값을 정하여 신호 대 잡음 비율이 낮은 신호에서 묵음 특징 정규화로 성능을 향상하였다. 논문에서 제시하는 방법의 성능 분석은 HMM과 CHMM을 비교하여 결과를 보였으며, 기존의 HMM과 CHMM을 비교한 결과 음성 종속 단계에서는 2.1%p의 인식률 향상이 있었으며, 음성 독립 단계에서는 0.7%p 만큼의 인식률 향상이 있었다.

Keywords

References

J. C. Hwang. Voice Recognition Performance Improvement using the Convergence of bayesian method and Selective Speech Feature Extraction. The Journal of the Korea Convergence Society. Vol. 7, No. 6, pp. 7-11, 2016. https://doi.org/10.15207/JKCS.2016.7.6.007
C. S. Ahn, S. Y. Oh. Echo Noise Robust HMM Learning Model using Average Estimator LMS Algorithm. The Journal of Digital Policy and Management. Vol. 10, No. 10, pp. 277-282, 2012.
C. S. Ahn, S. Y. Oh. Efficient Continuous Vocabulary Clustering Modeling for Tying Model Recognition Performance Improvement. Journal of the Korea Society of Computer and Information. Vol. 15, No. 1, pp. 177-183, 2010. https://doi.org/10.9708/jksci.2010.15.1.177
C. S. Ahn, S. Y. Oh. CHMM Modeling using LMS Algorithm for Continuous Speech Recognition Improvement. The Journal of digital policy and management. Vol. 10, No. 11, pp. 377-382, 2012.
C. S. Ahn, S. Y. Oh. Vocabulary Recognition Retrieval Optimized System using MLHF Model . Journal of the Korea Society of Computer and Information. Vol. 14, No. 10, pp. 217-223, 2009.
A. Srinivasan, Speech Recognition Using Hidden Markov Model, Applied Mathematical Sciences, vol. 5, no. 79, pp. 3943-3948, 2011.
S. M. Naqvi, M. Yu, J. A. Chamber. A Multimodal Approach to Blind Source Separation of Moving Sources. IEEE Trans. Signal Processing. Vol. 4, No. 5, pp. 895-910, 2010.
Beaufays, F., Vanhoucke, V. & Strope, B. Unsupervised discovery and training of maximally dissimilar cluster models. Proc. Interspeech, pp. 66-69, 2010.
S. Y. Oh. Improving Phoneme Recognition based on Gaussian Model using Bhattacharyya Distance Measurement Method. Journal of Korea Multimedia Society. Vol. 14, No. 1, pp. 85-93, 2011. https://doi.org/10.9717/kmms.2011.14.1.085
Hirsch, H. G. & Pearce, D. The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, in Proc. ICSLP. pp. 18-20. 2000.
Young, S. HTK: Hidden Markov Model Toolkit V3.4.1. Cambridge University, Engineering Department, Speech Group. 1993.
J. Y. Ahn, Sang-Bum Kim, Su-Hoon Kim, Kang-In Hur. A study on Voice Recognition using Model Adaptation HMM for Mobile Environment. The Journal of the Institute of Webcasting, Internet and Telecommunication. Vol. 11, No. 3, pp. 175-179, 2011.
S. Y. Oh. Selective Speech Feature Extraction using Channel Similarity in CHMM Vocabulary Recognition. The Journal of digital policy and management. Vol. 11, No. 7, pp. 453-458, 2013.
S. Y. Oh. Bayesian Method Improve Recognition Rates using HMM Vocabulary Recognition Model Optimization. The Journal of digital policy and management. Vol. 12, No. 7, pp. 273-278, 2014.
S. Y. Oh. Decision Tree State Tying Modeling Using Parameter Estimation of Bayesian Method The Journal of Digital Policy and Management. Vol. 13, No. 1, pp. 1243-248, 2015.
B. O. Kank, S. H. Lee, "Requirements Analysis in ID-based Future Internet," Journal of IT Convergence Society for SMB, Vol. 6, No. 3, pp. 43-48, 2016. https://doi.org/10.22156/CS4SMB.2016.6.3.043