Browse > Article
http://dx.doi.org/10.4218/etrij.17.0115.0122

Robust Non-negative Matrix Factorization with β-Divergence for Speech Separation  

Li, Yinan (Lab of Intelligent Information Processing, PLA University of Science and Technology)
Zhang, Xiongwei (Lab of Intelligent Information Processing, PLA University of Science and Technology)
Sun, Meng (Lab of Intelligent Information Processing, PLA University of Science and Technology)
Publication Information
ETRI Journal / v.39, no.1, 2017 , pp. 21-29 More about this Journal
Abstract
This paper addresses the problem of unsupervised speech separation based on robust non-negative matrix factorization (RNMF) with ${\beta}$-divergence, when neither speech nor noise training data is available beforehand. We propose a robust version of non-negative matrix factorization, inspired by the recently developed sparse and low-rank decomposition, in which the data matrix is decomposed into the sum of a low-rank matrix and a sparse matrix. Efficient multiplicative update rules to minimize the ${\beta}$-divergence-based cost function are derived. A convolutional extension of the proposed algorithm is also proposed, which considers the time dependency of the non-negative noise bases. Experimental speech separation results show that the proposed convolutional RNMF successfully separates the repeating time-varying spectral structures from the magnitude spectrum of the mixture, and does so without any prior training.
Keywords
Robust non-negative matrix factorization; Speech separation; Sparse and low-rank decomposition; ${\beta}$-divergence; Convolutional bases;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Y. Li et al., "Adaptive Extraction of Repeating Non-negative Temporal Patterns for Single Channel Speech Enhancement," Proc. IEEE Conf. Acoustics Speech Signal Process., Shanghai, China, Mar. 20-25, 2016, pp. 494-498.
2 Y. Li, et al., "Automatic Model Order Selection for Convolutive Non-negative Matrix Factorization," IEICE Trans. Fundam. Electron. Commun. Comput. Sci., vol. E99-A, no. 10, 2016, pp. 1867-1870.   DOI
3 E. Vincent, R. Gribonval, and C. Fevotte, "Performance Measurement in Blind Audio Source Separation," IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 4, July 2006, pp. 1462-1469.   DOI
4 V.Y.F. Tan, "Automatic Relevance Determination in Nonnegative Matrix Factorization with the ${\beta}$-Divergence," IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 7, July 2013, pp. 1592-1605.   DOI
5 P. Hoyer, "Non-negative Matrix Factorization with Sparseness Constraints," J. Mach. Learn. Res., vol. 5, 2004, pp. 1457-1469.
6 W. Wang, A. Cichocki, and J.A. Chamners, "A Multiplicative Algorithm for Convolutive Non-negative Matrix Factorization Based on Squared Euclidean Distance," IEEE Trans. Signal Process., vol. 57, no. 7, July 2009, pp. 2858-2864.   DOI
7 P. Smaragdis, "Convolutive Speech Bases and Their Application to Supervised Speech Separation," IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 1, Jan. 2007, pp. 1-12.   DOI
8 D. Wang et al., "Online Non-negative Convolutive Pattern Learning for Speech Signals," IEEE Trans. Audio, Speech, Language Process., vol. 61, no. 1, Jan. 2013, pp. 44-56.
9 J. Huang et al., "Speech Denoising Via Low-Rank and Sparse Matrix Decomposition," ETRI J., vol. 36, no. 1, Feb. 2014, pp. 167-170.   DOI
10 F.G. Germain and G.J. Mysore, "Speaker and Noise Independent Online Single-Channel Speech enhancement," Proc. IEEE Conf. Acoustics Speech Signal Process., Queensland, Australia, Apr. 19-24, 2015, pp. 71-75.
11 Y. Li et al., "Speech Enhancement Using Non-negative Matrix Low-Rank Modeling with Temporal Continuity and Sparseness Constraints," Proc. PCM, 2016, accepted.
12 M. Sun et al., "Speech Enhancement Under Low SNR Conditions Via Noise Estimation Using Sparse and Low-Rank NMF with Kullback-Leibler Divergence," IEEE Trans. Audio, Speech, Language Process., vol. 23, no. 7, July 2015, pp. 1233-1242.   DOI
13 C. Fevotte and J. Idier, "Algorithms for Nonnegative Matrix Factorization with the Beta-Divergence," Neural Comput., vol. 23, no. 9, Sept. 2011, pp. 2421-2456.   DOI
14 Z. Chen and D.P.W. Eills, "Speech Enhancement by Sparse, Low-Rank, and Dictionary Spectrogram Decomposition," Proc. Workshop Appli. Signal Process. Audio Acoustics, New Paltz, NY, USA, Oct. 20-23, 2013, pp. 1-4.
15 C. Sun, Q. Zhu, and M. Wan, "A Novel Speech Enhancement Method Based on Constrained Low-Rank and Sparse Matrix Decomposition," Speech Commun., vol. 60, May 2014, pp. 44-55.   DOI
16 L. Zhang et al., "Robust Non-negative Matrix Factorization," Frontiers Electric. Electron. Eng. China, vol. 6, no. 2, June 2011, pp. 192-200.   DOI
17 T. Virtanen et al., "Compositional Models for Audio Processing: Uncovering the Structure of Sound Mixtures," IEEE Signal Process. Mag., vol. 32, no. 2, Mar. 2015, pp. 125-144.   DOI
18 J.J. Carabias-Orti et al., "Constrained Non-negative Sparse Coding Using Learnt Instrument Templates for Real Time Music Transcription," Eng. Appli. Artificial Intell., vol. 26, no. 7, Aug. 2013, pp. 1671-1680.   DOI
19 D.D. Lee and H.S. Seung, "Learning the Parts of Objects with Nonnegative Matrix Factorization," Nature, vol. 401, Oct. 1999, pp. 788-791.   DOI
20 H. Li, Y. Shen, and J. Wnag, "An Improved Multiplicative Updating Algorithm for Non-negative Independent Component Analysis," ETRI J., vol. 35, no. 2, Apr. 2013, pp. 193-199.   DOI
21 Z. Duan, G.J. Mysore, and P. Smaragdis, "Online PLCA for Real-Time Semi-supervised Source Separation," Proc. Latent Variable Anal. Signal Separation, Tel Aviv, Israel, Mar. 12-15, 2012, pp. 34-41.
22 M. Sun and H. Van Hamme, "Large Scale Graph Regularized Non-negative Matrix Factorization with $l_1$ Normalization Based on Kullback-Leibler Divergence," IEEE Trans. Signal Process., vol. 60, no. 7, July 2012, pp. 3876-3880.   DOI
23 P. Smaragdis et al., "Static and Dynamic Source Separation Using Nonnegative Matrix Factorizations: a Unified View," IEEE Signal Process. Mag., vol. 31, no. 3, May 2014, pp. 66-75.   DOI
24 T. Virtanen, "Monaural Sound Source Separation by Non-negative Matrix Factorization with Temporal Continuity and Sparseness Criteria," IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 3, Mar. 2007, pp. 1066-1074.   DOI
25 P. Huang et al., "Sing-Voice Separation from Monaural Recording Using Robust Principal Component Analysis," Proc. IEEE Conf. Acoustics, Speech, Signal, Process., Kyoto, Japan, Mar. 25-30, 2012, pp. 57-60.
26 N. Mohammadiha, P. Smaragdis, and A. Leijon, "Supervised and Unsupervised Speech Enhancement Using Non-negative Matrix Factorization," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 10, Oct. 2013, pp. 2140-2151.   DOI
27 K.W. Wilson, B. Ray, and P. Smaragdis, "Regularized Non-negative Matrix Factorization with Temporal Dependencies for Speech Denoising," Proc. Interspeech, Jan. 2008, pp. 411-414.
28 E.J. Candes et al., "Robust Principle Component Analysis?," J. ACM, vol. 58, no. 3, 2011, pp. 11:1-11:37.