DOI QR코드

DOI QR Code

음성존재확률을 이용한 행렬식 기반 2채널 잡음제거기법

Determinant-based two-channel noise reduction method using speech presence probability

  • Park, Jinuk (School of the Electrical Engineering, Korea Advanced Institute of Science and Technology) ;
  • Hong, Jungpyo (Department of Information and Communication Engineering, Changwon National University)
  • 투고 : 2022.02.22
  • 심사 : 2022.03.12
  • 발행 : 2022.05.31

초록

본 논문에서는 음성존재확률을 활용한 2채널 입력신호 상관행렬의 행렬식 기반 잡음제거 기법을 제안하였다. 제안한 기법은 음성존재확률을 이용해 기존의 행렬식 기반 2채널 잡음제거 기법의 위너 필터 이득을 음성과 잡음구간에 따라 적응적으로 조절함으로써 잡음제거 성능을 더욱 향상시키고자 하였다. 제안한 기법은 잡음 종류, 반향 조건, 신호대잡음비, 잡음원의 개수와 방향이 다양한 모의 환경에서 객관적 평가 방법을 통해 평가되었다. 실험 결과는 대부분의 실험환경에서 행렬식 기반의 기법들이 위상차를 기반으로 한 기법들보다 성능이 우수했고 특히 제안한 음성존재확률을 이용한 행렬식 기반 잡음제거기법이 음성 신호 왜곡을 최소화하면서 가장 우수한 잡음 제거 성능을 보였다.

In this paper, a determinant-based two-channel noise reduction method which utilizes speech presence probability (SPP) is proposed. The proposed method improves noise reduction performance from the conventional determinant-based two-channel noise reduction method in [7] by applying SPP to the Wiener filter gain. Consequently, the proposed method adaptively controls the amount of noise reduction depending on the SPP. For performance evaluation, the segmental signal-to-noise ratio (SNR), the perceptual evaluation of speech quality, the short time objective intelligibility, and the log spectral distance were measured in the simulated noisy environments considered various types of noise, reverberation, SNR, and the direction and number of noise sources. The experimental results presented that determinant-based methods outperform phase difference-based methods in most cases. In particular, the proposed method achieved the best noise reduction performance maintaining minimum speech distortion.

키워드

과제정보

This research was supported by Changwon National University in 2021~2022.

참고문헌

  1. P. C. Loizou, Speech Enhancement, Boca Raton, FL, USA: CRC Press, 2007.
  2. O. Schwartz, S. Gannot, and E. A. P. Habets, "Multispeaker LCMV beamformer and postfilter for source separation and noise reduction," IEEE Transaction Audio Speech Language Processing, vol. 25, no. 5, pp. 940-951, May. 2017. https://doi.org/10.1109/TASLP.2017.2655258
  3. Y. Kubo, T. Nakatani, M. Delcroix, K. Kinoshita, and S. Araki, "Mask-based MVDR beamformer for noisy multisource environments: Introduction of time-varying spatial covariance model," in Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, pp. 6855-6859, 2019.
  4. P. Rakesh, S. S. Priyanka, and T. K. Kumar, "Performance evaluation of beamforming techniques for speech Enhancement," in Fourth International Conference on Signal Processing Communication and Networking, Chennai, India, pp. 1-5, 2017.
  5. J. Kim and M. Hahn, "Speech enhancement using a twostage network for an efficient boosting strategy," IEEE Signal Processing Letter, vol. 26, no. 5, pp. 770-774, May. 2019. https://doi.org/10.1109/lsp.2019.2905660
  6. J. Lee and H. -G. Kang, "A joint learning algorithm for complex-valued T-F masks in deep Learning-based singlechannel speech enhancement systems," IEEE/ACM Transaction, vol. 27, no. 6, pp. 1098-1108, Jun. 2019. https://doi.org/10.1109/taslp.2019.2910638
  7. D. Wang and J. Chen, "Supervised speech separation based on deep learning: An overview," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 10, pp. 1702-1726, Oct. 2018. https://doi.org/10.1109/taslp.2018.2842159
  8. P. Aarabi and G. Shi, "Phase-based dual-microphone robust speech enhancement," IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 34, no. 4, pp. 1763-1773, Aug. 2004. https://doi.org/10.1109/TSMCB.2004.830345
  9. S. M. Kim and H. K. Kim, "Direction-of-arrival based SNR estimation for dual-microphone speech enhancement," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 12, pp. 2207-2217, Dec. 2014. https://doi.org/10.1109/TASLP.2014.2360646
  10. J. Hong, S. Park, S. Jeong, and M. Hahn, "Dual-microphone noise reduction in car environments with determinant analysis of input correlation matrix," IEEE Sensors Journal, vol. 16, no. 9, pp. 3131-3140, May. 2016. https://doi.org/10.1109/JSEN.2016.2525811
  11. I. Cohen, "Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator," IEEE Signal Processing Letters, vol. 9, no. 4, pp. 113-116, Apr. 2002. https://doi.org/10.1109/97.1001645
  12. C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "A short-time objective intelligibility measure for timefrequency weighted noisy speech," in Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, pp. 4214-4217, 2010.
  13. S. Jeong and Y. Kim, "An Optimally-Modified Multichannel Wiener Filter Using Speech Presence Probability," Smart Media Journal, vol. 7, no. 3, pp. 9-15, Sep, 2018. https://doi.org/10.30693/SMJ.2018.7.3.9