Estimation and Weighting of Sub-band Reliability for Multi-band Speech Recognition

다중대역 음성인식을 위한 부대역 신뢰도의 추정 및 가중

  • 조훈영 (한국과학기술원 전자전산학과) ;
  • 지상문 (경성대학교 정보과학부) ;
  • 오영환 (한국과학기술원 전자전산학과)
  • Published : 2002.08.01

Abstract

Recently, based on the human speech recognition (HSR) model of Fletcher, the multi-band speech recognition has been intensively studied by many researchers. As a new automatic speech recognition (ASR) technique, the multi-band speech recognition splits the frequency domain into several sub-bands and recognizes each sub-band independently. The likelihood scores of sub-bands are weighted according to reliabilities of sub-bands and re-combined to make a final decision. This approach is known to be robust under noisy environments. When the noise is stationary a sub-band SNR can be estimated using the noise information in non-speech interval. However, if the noise is non-stationary it is not feasible to obtain the sub-band SNR. This paper proposes the inverse sub-band distance (ISD) weighting, where a distance of each sub-band is calculated by a stochastic matching of input feature vectors and hidden Markov models. The inverse distance is used as a sub-band weight. Experiments on 1500∼1800㎐ band-limited white noise and classical guitar sound revealed that the proposed method could represent the sub-band reliability effectively and improve the performance under both stationary and non-stationary band-limited noise environments.

최근에 Fletcher의 HSR (human speech recognition) 이론을 기초로 한 다중대역 (multi-band) 음성인식이 활발히 연구되고 있다. 다중대역 음성인식은 주파수 영역을 다수의 부대역으로 나누고 별도로 인식한 뒤 부대역들의 인식결과를 부대역 신뢰도로 가중 및 통합하여 최종 판단을 내리는 새로운 음성인식 방식으로서 잡음환경에 특히 강인하다고 알려졌다. 잡음이 정상적인 경우 무음구간의 잡음정보를 이용하여 부대역 신호대 잡음비(SNR)를 추정하고 이를 가중치로 사용하기도 하였으나, 비정상잡음은 시간에 따라 특성이 변하여 부대역 신호대 잡음비를 추정하기가 쉽지 않다. 본 논문에서는 깨끗한 음성으로 학습한 은닉 마코프 모델과 잡음음성의 통계적 정합에 의해 각 부대역에서 모델과 잡음음성 사이의 거리를 추정하고, 이 거리의 역을 부대역 가중치로 사용하는 ISD (inverse sub-band distance) 가중을 제안한다. 1500∼1800㎐로 대역이 제한된 백색잡음 및 클래식 기타음에 대한 인식 실험 결과, 제안한 방법은 정상 및 비정상대역제한잡음에 대하여 부대역의 신뢰도를 효과적으로 표현하며 인식 성능을 향상시켰다.

Keywords

References

  1. Speech Communication v.16 Speech recognition in noise environments: A survey Y.Gong https://doi.org/10.1016/0167-6393(94)00059-J
  2. Speech Communication v.22 Speech recognition by machines and humans R.P.Lippmann https://doi.org/10.1016/S0167-6393(97)00021-6
  3. IEEE Trans. On Speech and Audio Processing v.2 no.4 How do humans process and recognize speech? J.B.Allen https://doi.org/10.1109/89.326615
  4. Proc. Int. Conf. on Spoken Language Processing v.1 Towards ASR on partially corrupted speech H.Hermansky;S.Tibrewala;M.Pavel
  5. Proc. Int. Conf. on Spoken Language Processing v.1 ASR based on independent processing and recombination of partial frequency bands H.Bourlard;S.Dupont
  6. Proc. EUROSPEECH v.2 Towards a global qptimization scheme for multi-band speech recognition C.Christophe;H.J.Paul;F.Dominique
  7. Proc. Int. Conf. on Spoken Language Processing Optimization of sub-band weights using simulated noisy speech in multi-band speech recognition Y.C.Tam;B.Mak
  8. Proc. EUROSPEECH v.2 A recombination strategy for multi-band speech recognition based on mutual Information criterion S.Okawa;T.Nakajima;K.Shirai
  9. Proc. Int. Conf. on Acoustics, Speech and Signal Processing v.1 Adaptive ML-weighting in multi-band recombination of Gaussian mixture ASR A.Hagen;H.Bourlard;A.Morris
  10. IEEE Trans. On Speech and Audio Processing v.4 no.5 Robust continuous speech recognition using parallel model combination M.J.F.Gales;S.J.Young https://doi.org/10.1109/89.536929
  11. Speech Communication v.34 Multi-stream adaptive evidence combination for noise robust ASR A.Morris;A.Hagen;H.Glotin;H.Bourlard https://doi.org/10.1016/S0167-6393(00)00044-3
  12. 제 13회 음성통신 및 신호처리 워크샵 v.13 no.1 음성데이터베이스의 현황 및 과제 이용주
  13. IEEE Trans. On Speech and Audio Processing v.27 no.2 Suppression of acoustic noise in speech using spectral subtraction S.Boll https://doi.org/10.1109/TASSP.1979.1163209