Multi-channel Speech Enhancement Using Blind Source Separation and Cross-channel Wiener Filtering

  • Jang, Gil-Jin (Human Computer Interaction Laboratory, Samsung Advanced Institute of Technology) ;
  • Choi, Chang-Kyu (Human Computer Interaction Laboratory, Samsung Advanced Institute of Technology) ;
  • Lee, Yong-Beom (Human Computer Interaction Laboratory, Samsung Advanced Institute of Technology) ;
  • Kim, Jeong-Su (Human Computer Interaction Laboratory, Samsung Advanced Institute of Technology) ;
  • Kim, Sang-Ryong (Human Computer Interaction Laboratory, Samsung Advanced Institute of Technology)
  • Published : 2004.06.01

Abstract

Despite abundant research outcomes of blind source separation (BSS) in many types of simulated environments, their performances are still not satisfactory to be applied to the real environments. The major obstacle may seem the finite filter length of the assumed mixing model and the nonlinear sensor noises. This paper presents a two-step speech enhancement method with multiple microphone inputs. The first step performs a frequency-domain BSS algorithm to produce multiple outputs without any prior knowledge of the mixed source signals. The second step further removes the remaining cross-channel interference by a spectral cancellation approach using a probabilistic source absence/presence detection technique. The desired primary source is detected every frame of the signal, and the secondary source is estimated in the power spectral domain using the other BSS output as a reference interfering source. Then the estimated secondary source is subtracted to reduce the cross-channel interference. Our experimental results show good separation enhancement performances on the real recordings of speech and music signals compared to the conventional BSS methods.

Keywords

References

  1. K. Torkkola, 'Blind signal separation for audio signals - are we there yet?,' in Proc. ICA99, (Aussois, France), pp.261-266, January 1999
  2. S. Araki, S. Makino, R. Aichner, T. Nishikawa, and H. Saruwatari, 'Subband based blind source separation with appropriate processing for each frequency band,' in Proc. ICA2003, (Nara, Japan), pp.499-504, April 2003
  3. B. Widrow, J. R. Glover, J. M. McCool, J. Kaunitz, C. S. Williams, R. H. Hearn, J. R. Zeidler, E. Dong, and R. C. Goodlin, 'Adaptive noise cancelling: principles and applications,' Proceedings of the IEEE, vol.63, pp. 1692-1716, December 1975 https://doi.org/10.1109/PROC.1975.10036
  4. S. F. Boll, 'Suppression of acoustic noise in speech using spectral subtraction,' IEEE Trans. Acous, Speech and Signal Processing, ASSP, vol. 27, no. 2, pp.113-120, 1979 https://doi.org/10.1109/TASSP.1979.1163209
  5. L. Parra and C. Spence, 'Convolutive blind separation of nonstationary sources,' IEEE Trans. Speech and Audio Processing, vol.8, pp.320-327, May 2000 https://doi.org/10.1109/89.841214
  6. S. Choi, S. Amari, A. Cichocki, and R. wen LIU, 'Natural gradient learning with a nonholonomic constraint for blind deconvolution of multiple channels,' in Proc. ICA99, (Aussois, France), pp.371-376, January 1999
  7. H. Sawada, R. Mukai, S. Araki, and S. Makino, 'Polar coordinate based nonlinear function for frequency domain blind source separation,' in Proc. ICASSP, (Orlando, Florida), May 2002
  8. T.-W. Lee, A. J. Bell, and R. Orglmeister, 'Blind source separation of real world signals,' in Proc. ICNN, (Houston, USA), pp.2129-2135, June 1997
  9. N. S. Kim and J.-H. Chang, 'Spectral enhancement based on global soft decision,' IEEE Signal Processing Letters, vol.7, pp.108-110, May 2000 https://doi.org/10.1109/97.841154
  10. M.R. Weiss and E. Aschkenasy, 'Computerized audio processor,' Final Report, Rome Air Development Center RADC-TR-83-109, May 1983
  11. M. Berouti, R. Schwartz, and J. Makhoul, 'Enhancement of speech corrupted by additive noise', In Proc. ICASSP79, pp. 208-11, 1979
  12. G.-J. Jang, T.-W. Lee, and Y.-H. Oh, 'Learning statistically efficient features for speaker recognition,' in Proc. ICASSP, (Salt Lake City, Utah), May 2001
  13. A. J. Bell and T, J. Sejnowski, 'Learning the higher order structures of a natural sound,' Network: Computation in Neural Systems, vol.7, pp.261-266, July 1996 https://doi.org/10.1088/0954-898X/7/2/005
  14. E. Visser, M. Otsuka, and T.-W, Lee, 'A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments,' Speech Communications, vol.41, pp.393-407, 2003 https://doi.org/10.1016/S0167-6393(03)00010-4
  15. C. Choi, D. Kong, S. M. Yoon, and H.-K. Lee, 'Separation of multiple concurrent speeches using audio-visual speaker localization and minimum variance beamforming,' in Proc. ICSLP, (Jeju, Korea), October 4-8, 2004