[KSCI] Korea Science Citation Index Service

Multi-channel Speech Enhancement Using Blind Source Separation and Cross-channel Wiener Filtering

Jang, Gil-Jin (Human Computer Interaction Laboratory, Samsung Advanced Institute of Technology)
Choi, Chang-Kyu (Human Computer Interaction Laboratory, Samsung Advanced Institute of Technology)
Lee, Yong-Beom (Human Computer Interaction Laboratory, Samsung Advanced Institute of Technology)
Kim, Jeong-Su (Human Computer Interaction Laboratory, Samsung Advanced Institute of Technology)
Kim, Sang-Ryong (Human Computer Interaction Laboratory, Samsung Advanced Institute of Technology)

Publication Information

The Journal of the Acoustical Society of Korea / v.23, no.2E, 2004 , pp. 56-67 More about this Journal

Abstract

Despite abundant research outcomes of blind source separation (BSS) in many types of simulated environments, their performances are still not satisfactory to be applied to the real environments. The major obstacle may seem the finite filter length of the assumed mixing model and the nonlinear sensor noises. This paper presents a two-step speech enhancement method with multiple microphone inputs. The first step performs a frequency-domain BSS algorithm to produce multiple outputs without any prior knowledge of the mixed source signals. The second step further removes the remaining cross-channel interference by a spectral cancellation approach using a probabilistic source absence/presence detection technique. The desired primary source is detected every frame of the signal, and the secondary source is estimated in the power spectral domain using the other BSS output as a reference interfering source. Then the estimated secondary source is subtracted to reduce the cross-channel interference. Our experimental results show good separation enhancement performances on the real recordings of speech and music signals compared to the conventional BSS methods.

Keywords

Blind source separation (BSS); Spectral subtraction; Wiener filtering; Adaptive noise cancellation (ANC).;

Citations & Related Records

Reference

1	S. Araki, S. Makino, R. Aichner, T. Nishikawa, and H. Saruwatari, 'Subband based blind source separation with appropriate processing for each frequency band,' in Proc. ICA2003, (Nara, Japan), pp.499-504, April 2003
2	B. Widrow, J. R. Glover, J. M. McCool, J. Kaunitz, C. S. Williams, R. H. Hearn, J. R. Zeidler, E. Dong, and R. C. Goodlin, 'Adaptive noise cancelling: principles and applications,' Proceedings of the IEEE, vol.63, pp. 1692-1716, December 1975 DOI ScienceOn
3	T.-W. Lee, A. J. Bell, and R. Orglmeister, 'Blind source separation of real world signals,' in Proc. ICNN, (Houston, USA), pp.2129-2135, June 1997
4	M.R. Weiss and E. Aschkenasy, 'Computerized audio processor,' Final Report, Rome Air Development Center RADC-TR-83-109, May 1983
5	H. Sawada, R. Mukai, S. Araki, and S. Makino, 'Polar coordinate based nonlinear function for frequency domain blind source separation,' in Proc. ICASSP, (Orlando, Florida), May 2002
6	S. Choi, S. Amari, A. Cichocki, and R. wen LIU, 'Natural gradient learning with a nonholonomic constraint for blind deconvolution of multiple channels,' in Proc. ICA99, (Aussois, France), pp.371-376, January 1999
7	C. Choi, D. Kong, S. M. Yoon, and H.-K. Lee, 'Separation of multiple concurrent speeches using audio-visual speaker localization and minimum variance beamforming,' in Proc. ICSLP, (Jeju, Korea), October 4-8, 2004
8	S. F. Boll, 'Suppression of acoustic noise in speech using spectral subtraction,' IEEE Trans. Acous, Speech and Signal Processing, ASSP, vol. 27, no. 2, pp.113-120, 1979 DOI
9	E. Visser, M. Otsuka, and T.-W, Lee, 'A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments,' Speech Communications, vol.41, pp.393-407, 2003 DOI ScienceOn
10	K. Torkkola, 'Blind signal separation for audio signals - are we there yet?,' in Proc. ICA99, (Aussois, France), pp.261-266, January 1999
11	N. S. Kim and J.-H. Chang, 'Spectral enhancement based on global soft decision,' IEEE Signal Processing Letters, vol.7, pp.108-110, May 2000 DOI ScienceOn
12	G.-J. Jang, T.-W. Lee, and Y.-H. Oh, 'Learning statistically efficient features for speaker recognition,' in Proc. ICASSP, (Salt Lake City, Utah), May 2001
13	M. Berouti, R. Schwartz, and J. Makhoul, 'Enhancement of speech corrupted by additive noise', In Proc. ICASSP79, pp. 208-11, 1979
14	L. Parra and C. Spence, 'Convolutive blind separation of nonstationary sources,' IEEE Trans. Speech and Audio Processing, vol.8, pp.320-327, May 2000 DOI ScienceOn
15	A. J. Bell and T, J. Sejnowski, 'Learning the higher order structures of a natural sound,' Network: Computation in Neural Systems, vol.7, pp.261-266, July 1996 DOI ScienceOn