DOI QR코드

DOI QR Code

CASA 기반의 마이크간 전달함수 비 추정 알고리즘

CASA Based Approach to Estimate Acoustic Transfer Function Ratios

  • 신민규 (고려대학교 전자전기전파공학부) ;
  • 고한석 (고려대학교 전자전기전파공학부)
  • 투고 : 2013.09.24
  • 심사 : 2013.10.25
  • 발행 : 2014.01.31

초록

본 논문은 비정상 (nonstationary)특성을 가지는 잡음환경에서 마이크간 전달함수 비 (RTF, Relative Transfer Function) 추정 알고리즘을 제안한다. 음성을 이용한 다양한 기기에 다중 마이크를 이용한 잡음제거 기술은 널리 사용되며, 이때 각 마이크간의 입력 신호 사이의 관계는 필수적으로 추정되어야 한다. 본 논문에서는 기존의 OM-LSA(Optimally-Modified Log-Spectral Amplitude)기반의 추정 방식에 CASA (Computational Auditory Scene Analysis)를 접목시킨 방식을 제안한다. 제안한 방법의 성능 검증을 위하여 비정상 백색 잡음 (nonstationary white Gaussian noise) 환경에서 10명 화자 발음을 이용한 마이크간 전달함수 비 추정 성능 평가 실험을 수행하였다. 잡음 신호가 초당 8dB 증감하는 환경에서 SBF (Signal Blocking Factor)가 평균 2.65dB 개선됨을 확인하였다.

Identification of RTF (Relative Transfer Function) between sensors is essential to multichannel speech enhancement system. In this paper, we present an approach for estimating the relative transfer function of speech signal. This method adapts a CASA (Computational Auditory Scene Analysis) technique to the conventional OM-LSA (Optimally-Modified Log-Spectral Amplitude) based approach. Evaluation of the proposed approach is performed under simulated stationary and nonstationary WGN (White Gaussian Noise). Experimental results confirm advantages of the proposed approach.

키워드

참고문헌

  1. L. Griffiths and C. Jim, "An alternative approach to linearly constrained adaptive beamforming," IEEE Trans Antennas Propag, 30, 27-34 (1982). https://doi.org/10.1109/TAP.1982.1142739
  2. A. Krueger, E. Warsitz, and R. Haeb-Umbach, "Speech enhancement with a GSC-like structure employing eigenvector-based transfer function ratios estimation," IEEE Trans Audio Speech Lang Processing, 19, 206-219 (2011). https://doi.org/10.1109/TASL.2010.2047324
  3. S. Gannot, D. Burshtein, and E. Weinstein, "Signal enhancement using beamforming and nonstationarity with applications to speech," IEEE Trans Signal Processing, 49, 1614-1626 (2001). https://doi.org/10.1109/78.934132
  4. O. Shalvi and E. Weinstein, "System identification using nonstationary signals," IEEE Trans Signal Processing, 44, 2055-2063 (1996). https://doi.org/10.1109/78.533725
  5. I. Cohen, "Relative transfer function identification using speech signals," IEEE Trans Speech Audio Process, 12, 451-459 (2004). https://doi.org/10.1109/TSA.2004.832975
  6. R. Talmon, I. Cohen, and S. Gannot, "Relative transfer function identification using convolutive transfer function approximation," IEEE Trans Audio Speech Lang Processing, 17, 546-555 (2009). https://doi.org/10.1109/TASL.2008.2009576
  7. I. Cohen and B. Berdugo, "Speech enhancement for nonstationary noise environments," Signal processing, 81, 2403-2418 (2001). https://doi.org/10.1016/S0165-1684(01)00128-1
  8. D. Wang and G. J. Brown, Computational auditory scene analysis: Principles, algorithms, and applications (Wiley-IEEE Press, New York, 2006), pp. 81-114.
  9. G. Hu and D. Wang, "Monaural speech segregation based on pitch tracking and amplitude modulation," IEEE Trans Neural Netw, 15, 1135-1150 (2004). https://doi.org/10.1109/TNN.2004.832812
  10. "DARPA Resource Management Continuous Speech Database (RM1)," NIST Speech Disc 2-5.1 (1996).