DOI QR코드

DOI QR Code

An Optimally-Modified Multichannel Wiener Filter Using Speech Presence Probability

음성존재확률을 이용한 최적 변형 다채널 위너 필터

  • 정상배 (경상대학교 전자공학과) ;
  • 김영일 (경상대학교 전자공학과)
  • Received : 2017.11.20
  • Accepted : 2018.09.21
  • Published : 2018.09.30

Abstract

This paper proposes an optimal gain modification method of the Multichannel Wiener filter (MWF) using speech presence probabilities. Conventional gain modification methods of MWFs have the problem of the increase of speech distortions while reducing residual noises with its relative heuristic approach. However, the proposed optimal gain modification method, derived by solving the unconstrained minimization problem of the probability-involved cost function, reduces amounts of residual noises and signal distortions simultaneously. Through an evaluation of the filtered waveforms and spectrograms, it is verified that the proposed method results in an improved SNR with less signal distortions compared to the conventional MWF.

본 논문에서는 음성존재확률을 이용하여 다채널 위너필터의 이득을 최적으로 변형하는 방법을 제안한다. 기존의 음성존재확률을 이용한 다채널 위너필터의 변형은 다소 경험적인 방법을 사용하기 때문에 잔여잡음의 양을 줄이면 음성왜곡이 증가하는 문제가 있다. 하지만, 제안된 최적 변형 다채널 위너필터는 음성존재확률을 최적 필터를 도출하기 위한 비용함수에 적용하여 비제한적 최소화 문제의 해를 이용하여 잔여잡음의 양과 음성왜곡을 동시에 줄일 수 있는 결과를 보였다. 잡음제거된 파형과 스펙트로그램의 평가를 통해서 제안된 최적 변형 다채널 위너필터가 종래의 다채널 위너필터와 비교하여 향상된 SNR과 음성왜곡을 나타냄을 확인할 수 있었다.

Keywords

References

  1. G. Deepak, J.W. Lee, "Comparison of Two Methods for Stationary Incident Detection Based on Background Image," 스마트미디어저널, 제1권, 제3호, 48-55쪽, 2012년 9월
  2. 이유라, 김수형, 김영철, 나인섭, "심층 학습 모델을 이용한 EPS 동작 신호의 인식," 스마트미디어저널, 제5권, 제3호, 35-41쪽, 2016년 9월
  3. P.C. Loizou, Speech Enhancement: Theory and Practice. Boca Raton, FL: CRC, pp. 291-394, 2007.
  4. J. Benety, Microphone Array Signal Processing. Heidelberg, Berlin: Springer-Verlag, pp. 127-214, 2007.
  5. M. Souden, "On optimal frequency-domain multichannel linear filtering for noise reduction," IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 2, pp. 260-276, 2010. https://doi.org/10.1109/TASL.2009.2025790
  6. N.S. Kim, J.H. Chang, "Spectral enhancement based on global soft decision," IEEE Signal Process. Lett., vol. 7, no. 6, pp. 108-110, 2000. https://doi.org/10.1109/97.841154
  7. I. Cohen, "Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator," IEEE Signal Process. Lett., vol. 9, no. 4, pp. 113-116, 2002. https://doi.org/10.1109/97.1001645
  8. M. Souden, "An integrated solution for online multichannel noise tracking and reduction," IEEE Trans. Audio, Speech, Language Process., vol. 19, no. 7, pp. 2159-2169, 2011. https://doi.org/10.1109/TASL.2011.2118205
  9. M. Souden, "Gaussian model-based multichannel speech presence probability," IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 5, pp. 1072-1077, 2010. https://doi.org/10.1109/TASL.2009.2035150
  10. IEEE Subcommittee, "IEEE recommended practice for speech quality measurements," IEEE Trans. Audio Electroacoust., vol. AE-17, no. 3, pp. 225-246, 1969.
  11. A.P. Varga, "The Noisex-92 study on the effect of additive noise on automatic speech recognition," Tech. Rep. DRA Speech Research Unit, 1992.
  12. J. Allen, "Image method for efficiently simulating small-room acoustics," J . Acoust. Soc. Amer., vol. 65, pp. 943-950, 1979. https://doi.org/10.1121/1.382599
  13. E. Lehmann, "Prediction of energy decay in room impulse responses simulated with an image-source model," J . Acoust. Soc. Amer., vol. 123, pp. 269-277, 2008.
  14. J.J. Shynk, "Frequency-domain and multirate adaptive filtering," IEEE Signal Process. Mag., vol. 9, no.1, pp. 15-37, 1992.
  15. S. Gannot, "Signal enhancement using beamforming and nonstationarity with application to speech," IEEE Trans. Signal Process., vol. 49, no. 8, pp. 1614-1626, 2001. https://doi.org/10.1109/78.934132
  16. S. Affes, "A signal subspace tracking algorithm for microphone array processing of speech," IEEE Trans. Speech, Audio Process., vol. 5, pp. 425-437, 1997. https://doi.org/10.1109/89.622565
  17. J. Chen, "New insights into the noise reduction Wiener filter," IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 4, pp. 1218-1234, 2006. https://doi.org/10.1109/TSA.2005.860851