[KSCI] Korea Science Citation Index Service

Target Speaker Speech Restoration via Spectral bases Learning

Park, Sun-Ho (포항공과대학교 컴퓨터공학과)
Yoo, Ji-Ho (포항공과대학교 컴퓨터공학과)
Choi, Seung-Jin (포항공과대학교 컴퓨터공학과)

Publication Information

Journal of KIISE:Software and Applications / v.36, no.3, 2009 , pp. 179-186 More about this Journal

Abstract

This paper proposes a target speech extraction which restores speech signal of a target speaker form noisy convolutive mixture of speech and an interference source. We assume that the target speaker is known and his/her utterances are available in the training time. Incorporating the additional information extracted from the training utterances into the separation, we combine convolutive blind source separation(CBSS) and non-negative decomposition techniques, e.g., probabilistic latent variable model. The nonnegative decomposition is used to learn a set of bases from the spectrogram of the training utterances, where the bases represent the spectral information corresponding to the target speaker. Based on the learned spectral bases, our method provides two postprocessing steps for CBSS. Channel selection step finds a desirable output channel from CBSS, which dominantly contains the target speech. Reconstruct step recovers the original spectrogram of the target speech from the selected output channel so that the remained interference source and background noise are suppressed. Experimental results show that our method substantially improves the separation results of CBSS and, as a result, successfully recovers the target speech.

Keywords

target speech extraction; convolutive blind source separation(CBSS); training utterances; non-negative decomposition techniques; postprocessing steps for the CBSS;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	H. Sawada, S. Araki, R. Mukai, S. Makino, 'Blind extraction of a dominant source from mixtures of many sources using ica and time-frequency masking,' in: Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 5882-5885, 2007 DOI ScienceOn
2	S. Y. Low, R. Togneri, S. Nordholm, 'Spatiotemporal processing for distant speech recognition,' in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004
3	D. D. Lee, H. S. Seung, 'Algorithms for nonnegative matrix factorization,' in: Advances in Neural Information Processing Systems, Vol. 13, MIT Press, 2001 DOI ScienceOn
4	P. D. O. Grady, B. A. Pearlmutter, 'Convolutive non-negative matrix factorisation with sparseness constraint,' in: Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2006
5	M. Brand, 'Structure learning in conditional probability models via an entropic prior and parameter extinction, Neural Computation,' 11(5), 1155-1182, 1999 DOI
6	A. Belouchrani, K. Abed-Merain, J. F. Cardoso, E. Moulines, 'A blind source separation technique using second order statistics,' IEEE Trans. Signal Processing 45, 434-444, 1997 DOI ScienceOn
7	D. Pham, C. Serviere, H. Boumaraf, 'Blind separation of convolutive audio mixtures using nonstationarity,' in: Proceedings of the International Conference on Independent Component Analysis and Blind Signal Separation, pp. 107-110, 2003
8	Jiho Yoo and Seungjin Choi (2008), 'Orthogonal nonnegative matrix factorization: Multiplicative updates on Stiefel manifolds,' in Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL-2008 DOI ScienceOn
9	P. O. Hoyer, 'Non-negative matrix factorization with sparseness constraints,' Journal of Machine Learning Research 5, 1457-1469, 2004
10	S. Choi, A. Cichocki, A. Belouchrani, 'Blind separation of second-order nonstationary and temporally colored sources,' in: Proceedings of IEEE Workshop on Statistical Signal Processing, Singapore, pp. 444-447, 2001
11	D. T. Pham, 'Joint approximate diagonalization of positive denite matrices,' 22(4), 1163-1152, 2001.
12	C. Choi, G. Jang, Y. Lee, S. R. Kim, 'Adaptive cross-channel interference cancellation on blind source separation outputs,' in: Proceedings of International Conference on Independent Component Analysis and Blind Signal Separation, 2004 과학기술학회마을
13	S. Amari, S. C. Douglas, A. Cichocki, H. H. Yang, 'Multichannel blind deconvolution and equalization using the natural gradient,' in: Proceedings of the IEEE International Conference on Signal Processing Advances in Wireless Communications, Paris, France, pp. 101-104, 1997
14	P. Smaragdis, B. Raj, M. Shashanka, 'Supervised and semi-supervised separation of sounds from single-channel mixtures,' in: Proceedings of International Conference on Independent Component Analysis and Signal Separation, 2007
15	extraction from interferences in real environment using bank of lters and blind source separation, in:Proceedings Third AustralianWorkshop on Signal Processing and Applications, 2000
16	M. V. S. Shashanka, P. Smaragdis, 'Sparse overcomplete decomposition for single channel speaker separation,' in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 641-644, 2007
17	E. Visser, M. Otsuka, Lee, 'A spatiotemporal speech enhancement scheme for robust speech recognition in noisy environments,' Speech Communication 41(15), 393-407, 2003 DOI ScienceOn
18	L. C. Parra, C. Spence, 'Convolutive blind source separation of non-stationary sources,' IEEE Trans. Speech and Audio Processing 320-327, 2000 DOI ScienceOn
19	B. Raj, P. Smaragdis, 'Latent variable decomposition of spectrograms for single channel speaker separation,' in: IEEE Workshop of Applications of Signal Processing to Audio and Acoustics, pp.17-20, 2005
20	J. F. Cardoso, A. 'Souloumiac, Blind beamforming for non Gaussian signals,' IEE Proceedings-F 140(6), 362-370, 1993.
21	A. Ziehe, P. Laskov, G. Nolte, K. R. Muller, 'A fast algorithm for joint diagonalization with nonorthogonal transformations and its application to blind source separation,' Journal of Machine Learning Research 5, 777-800, 2004
22	J. Kocinski, 'Speech intelligibility improvement using convolutive blind source separation assisted by denoising algorithms,' Speech Communication 50, 29-37, 2008 DOI ScienceOn
23	M. V. S. Shashanka, 'Latent variable framework for modeling and separating single channel acoustic sources,' Ph.D. thesis, Department of Cognitive and Neural Systems, Boston University, 2007
24	P. Smaragdis, 'Information-theoretic approaches to source separation,' Master's thesis, Massachusetts Institute of Technology, 1997
25	T. Hofmann, 'Probablistic latent semantic indexing,' in: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, 1999
26	D. R. Campbell, K. J. Palomaki, G. J. Brown, A 'matlab simulation of shoebox room acoustics for use in research and teaching,' Computing and Information Systems Journal 9(3), 1352-1404, 2005
27	K. Torkkola, 'Blind separation of convolved sources based on information maximization,' in: Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, pp. 423-432, 1996
28	E. Vincent, C. Fevotte, R. Gribonval, 'Performance measurement in blind audio source separation,' IEEE Trans. on Audio, Speech and Language Processing 14(4), 1462-1469, 2006 DOI ScienceOn

KSCI

Target Speaker Speech Restoration via Spectral bases Learning 주파수 특성 기저벡터 학습을 통한 특정화자 음성 복원

Target Speaker Speech Restoration via Spectral bases Learning