Speech Recognition based on Environment Adaptation using SNR Mapping

Chung, Yong-Joo;

doi:10.13067/JKIECS.201.9.5.543

한국전자통신학회논문지 (The Journal of the Korea institute of electronic communication sciences)

제9권5호
/
Pages.543-548
/
2014
/
1975-8170(pISSN)

한국전자통신학회 (Korea Institute of Electronic Communication Science)

DOI QR Code

SNR 매핑을 이용한 환경적응 기반 음성인식

Speech Recognition based on Environment Adaptation using SNR Mapping

정용주 (계명대학교 전자공학과)

Chung, Yong-Joo

투고 : 2014.03.05
심사 : 2014.05.15
발행 : 2014.05.31

https://doi.org/10.13067/JKIECS.201.9.5.543 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

다 모델 기반의 음성인식기는 음성인식에서 매우 성공적임이 알려져 있다. 그것은 다양한 신호-대-잡음비(SNR)와 잡음종류에 해당하는 다수의 HMM을 사용함으로서 선택된 음향모델이 인식잡음음성에 매우 근접한 일치성을 가질 수 있기 때문이다. 그러나 실제 사용시에 HMM의 개수가 제한됨에 따라서 음향모델의 불일치는 여전히 문제로 남아 있다. 본 논문에서는 인식잡음음성과 HMM 간의 SNR 불일치를 줄이고자 이들 간의 최적의 SNR 매핑 (mapping)을 실험적으로 결정하였다. 인식잡음음성으로 부터 추정된 SNR 값을 사용하는 대신 제안된 SNR 매핑을 사용함으로서 향상된 인식결과를 얻을 수 있었다. 다 모델 기반인식기에 제안된 방법을 적용하여 Aurora 2 데이터베이스에 대해서 인식 실험한 결과 기존의 MTR 이나 다 모델 기반 음성인식기에 비해서 6.3%와 9.4%의 상대적 단어 오인식율 감소를 이룰 수 있었다.

Multiple-model based speech recognition framework (MMSR) has been known to be very successful in speech recognition. Since it uses multiple hidden Markov modes (HMMs) that corresponds to various noise types and signal-to-noise ratio (SNR) values, the selected acoustic model can have a close match with the test noisy speech. However, since the number of HMM sets is limited in practical use, the acoustic mismatch still remains as a problem. In this study, we experimentally determined the optimal SNR mapping between the test noisy speech and the HMM set to mitigate the mismatch between them. Improved performance was obtained by employing the SNR mapping instead of using the estimated SNR from the test noisy speech. When we applied the proposed method to the MMSR, the experimental results on the Aurora 2 database show that the relative word error rate reduction of 6.3% and 9.4% was achieved compared to a conventional MMSR and multi-condition training (MTR), respectively.

키워드

참고문헌

S. Ball, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process. vol. 27, no. 2, 1979, pp. 113-120. https://doi.org/10.1109/TASSP.1979.1163209
M. J. F. Gales, "Model based techniques for noise-robust speech recognition," Ph.D. Dissertation, University of Cambridge, 1996.
P. J. Moreno, "Speech Recognition in noisy environments," Ph.D. Dissertation, Carnegie Mellon University, 1996.
J. H. L. Hansen and M. A. Clements, "Constrained iterative speech enhancement with application to speech recognition," IEEE Trans. on Signal Processing, vol. 39, no. 4, 1991, pp. 795-805. https://doi.org/10.1109/78.80901
J. Choi, "Speech and noise recognition system by neural network," J. of the Korea Institute of Electronic Communication Sciences, vol. 5, no. 4, 2010, pp. 357-362.
C. Lee and D. kim, "Adaptive noise reduction of speech using wavelet transform," J. of the Korea Institute of Electronic Communication Sciences, vol. 4, no. 3, 2009, pp. 190-196.
J.-S. Choi, "Noise reduction algorithm in speech by Wiener filter," J. of the Korea Institute of Electronic Communication Sciences, vol. 8, no. 9, 2013, pp. 1293-1298. https://doi.org/10.13067/JKIECS.2013.8.9.1293
H. G. Hirsch and D. Pearce, "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions," In Proc. the Int. Conf. on Spoken Language Processing, Bejing, China, 2000, pp. 18-20.
H. Xu, Z. H. Tan, P. Dalsgaard, and B. Lindberg, "Robust speech recognition on noise and SNR classification-a multiple-model framework," In Proc. INTERSPEECH, Lisboa, Portugal, 2005, pp. 977-980.
H. Xu, Z. H. Tan, P. Dalsgaard, and B Lindberg, "Noise condition dependent training based on noise classification and SNR estimation," IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 8, 2007, pp. 2431-2443. https://doi.org/10.1109/TASL.2007.906188
L. Lamel, L. Rabiner, A. Rosenberg, and J. Wilpon, "An improved endpoint detector for isolated word recognition," IEEE Trans. Acoust., Speech, Signal Process. vol. 29, no. 4, 1981, pp. 777-785. https://doi.org/10.1109/TASSP.1981.1163642

한국전자통신학회논문지 (The Journal of the Korea institute of electronic communication sciences)

SNR 매핑을 이용한 환경적응 기반 음성인식

Speech Recognition based on Environment Adaptation using SNR Mapping

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)