DOI QR코드

DOI QR Code

Missing-Feature 복구를 위한 대역 독립 방식의 베이시안 분류기 기반 마스크 예측 기법

Mask Estimation Based on Band-Independent Bayesian Classifler for Missing-Feature Reconstruction

  • Kim Wooil (Dept. of Electrical Engineering, University of Texas at Dallas) ;
  • Stern Richard M. (Dept. of Electrical and Computer Engineering, Carnegie Mellon University) ;
  • Ko Hanseok
  • 발행 : 2006.02.01

초록

본 논문에서는 알려지지 않은 잡음 환경에서 강인한 음성 인식 성능을 위하여 missing-feature복구 기법을 다루며, 베이시안 분류기를 기반으로 하는 마스크 예측 기법의 성능을 향상시킬 수 있는 방법을 제안한다. 기존의 마스크 예측 기법에서는 배경 잡음 종류에 독립적인 성능을 위해 전 주파수 대역을 분할하여 발생시킨 유색 잡음을 마스크 예측기의 훈련에 이용하였으나, 제한된 양의 훈련 데이터베이스 조건에서는 성능의 한계가 불가피하다. 보다 다양한 잡음 스펙트럼을 반영하면서 마스크 예측의 성능을 향상시키기 위해, 서로 다른 주파수 대역에 독립적인 구조를 가지는 베이시안 분류기를 제안하며, 훈련에 사용하는 유색 잡음의 생성 방식을 이에 맞게 수정한다. 각각의 주파수 대역을 분할하여 유색 잡음을 생성함으로써 다양한 잡음 환경을 반영하는 동시에 훈련 데이터베이스 부족 문제를 줄일 수 있다. 제안하는 마스크 예측 기법을 클러스터 기반의 missing-feature 복구 기법과 결합하여 음성 인식기에 적용함으로써 성능을 평가한다. 실험 결과는 제안한 기법이 백색 잡음, 자동차잡음, 배경 음악환경에서 기존의 방법에 비해 향상된 성능을 가짐을 입증한다.

In this paper. we propose an effective mask estimation scheme for missing-feature reconstruction in order to achieve robust speech recognition under unknown noise environments. In the previous work. colored noise is used for training the mask classifer, which is generated from the entire frequency Partitioned signals. However it gives a limited performance under the restricted number of training database. To reflect the spectral events of more various background noise and improve the performance simultaneously. a new Bayesian classifier for mask estimation is proposed, which works independent of other frequency bands. In the proposed method, we employ the colored noise which is obtained by combining colored noises generated from each frequency band in order to reflect more various noise environments and mitigate the 'sparse' database problem. Combined with the cluster-based missing-feature reconstruction. the performance of the proposed method is evaluated on a task of noisy speech recognition. The results show that the proposed method has improved performance compared to the Previous method under white noise. car noise and background music conditions.

키워드

참고문헌

  1. R. Singha, R. M Stern, and B. Raj. 'Signal and Feature Compensation Methods for Robust Speech Recognition,' Chapter in CRC Handbook on Noise Reduction in Speech Applications, CRC Press, 2002
  2. R. Singha, R. M. Stern, and B. Raj, 'Model Compensation and Matched Condition Methods for Robust Speech Recognition,' Chapter in CRC Handbook on Noise Reduction in Speech Applications, CRC Press, 2002
  3. R.P. Lippmann and B.A. Carlson, 'Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering, and noise,' Eurospeech, 37-40, 1997
  4. M. Cooke, P. Green, L. Josifovski, and A. Vizinho, 'Robust automatic speech recognition with missing and unreliable acoustic data,' Speech Communication, 34 (3): 267-285, 2001 https://doi.org/10.1016/S0167-6393(00)00034-0
  5. B. Raj, M. L. Seltzer, and R. M Stern, 'Reconstruction of missing features for robust speech recognition,' Speech Communication, 43 (4): 275-296, 2004 https://doi.org/10.1016/j.specom.2004.03.007
  6. M. L. Seltzer, B. Raj, and R. M. Stern, 'A Bayesian classifier for spectrographic mask estimation for missing-feature speech recognition,' Speech Communication, 43 (4): 379-393, 2004 https://doi.org/10.1016/j.specom.2004.03.006
  7. W. Kim, R. M. Stern, and H. Ko, 'Environment-Independent Mask Estimation for Missing-Feature Reconstruction,' Interspeech 2005, 2637-2640, Sep. 2005
  8. M. L. Seltzer, Automatic Detection of Corrupted Speech Features for Robust Speech Recognition, MS. thesis, Carnegie Mellon University, 2000
  9. J. D. Johnston, 'Transform coding of audio signals using perceptual noise criteria,' IEEE Journal on Selected Areas in Communications, 6, 314-323, Feb. 1988 https://doi.org/10.1109/49.608
  10. H. G. Hirsch & D. Pearce, 'The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Conditions', ISCA ITRW ASR2000, Sep. 2000
  11. ETSI standard document, 'Speech Processing, Transmission and Quality aspects (STQ) Distributed speech recognition: Front-end feature extraction algorithm Compression algorithms,' ETSI ES 201 108 v1.1.2 (2000-04), Feb. 2000
  12. R. Martin, 'Spectral Subtraction Based on Minimum Statistics,' EUSIPCO-94, 1182-1185, Sep. 1994