Robust Non-negative Matrix Factorization with β-Divergence for Speech Separation

Li, Yinan;Zhang, Xiongwei;Sun, Meng;

doi:10.4218/etrij.17.0115.0122

ETRI Journal

Volume 39 Issue 1
/
Pages.21-29
/
2017
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Robust Non-negative Matrix Factorization with β-Divergence for Speech Separation

Li, Yinan (Lab of Intelligent Information Processing, PLA University of Science and Technology) ;
Zhang, Xiongwei (Lab of Intelligent Information Processing, PLA University of Science and Technology) ;
Sun, Meng (Lab of Intelligent Information Processing, PLA University of Science and Technology)

Received : 2016.02.05
Accepted : 2016.11.03
Published : 2017.02.01

https://doi.org/10.4218/etrij.17.0115.0122 Citation PDF KSCI KPUBS

Download PDF

⟨ Previous Next ⟩

Abstract

This paper addresses the problem of unsupervised speech separation based on robust non-negative matrix factorization (RNMF) with ${\beta}$-divergence, when neither speech nor noise training data is available beforehand. We propose a robust version of non-negative matrix factorization, inspired by the recently developed sparse and low-rank decomposition, in which the data matrix is decomposed into the sum of a low-rank matrix and a sparse matrix. Efficient multiplicative update rules to minimize the ${\beta}$-divergence-based cost function are derived. A convolutional extension of the proposed algorithm is also proposed, which considers the time dependency of the non-negative noise bases. Experimental speech separation results show that the proposed convolutional RNMF successfully separates the repeating time-varying spectral structures from the magnitude spectrum of the mixture, and does so without any prior training.

Keywords

References

P. Smaragdis et al., "Static and Dynamic Source Separation Using Nonnegative Matrix Factorizations: a Unified View," IEEE Signal Process. Mag., vol. 31, no. 3, May 2014, pp. 66-75. https://doi.org/10.1109/MSP.2013.2297715
T. Virtanen, "Monaural Sound Source Separation by Non-negative Matrix Factorization with Temporal Continuity and Sparseness Criteria," IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 3, Mar. 2007, pp. 1066-1074. https://doi.org/10.1109/TASL.2006.885253
Z. Duan, G.J. Mysore, and P. Smaragdis, "Online PLCA for Real-Time Semi-supervised Source Separation," Proc. Latent Variable Anal. Signal Separation, Tel Aviv, Israel, Mar. 12-15, 2012, pp. 34-41.
N. Mohammadiha, P. Smaragdis, and A. Leijon, "Supervised and Unsupervised Speech Enhancement Using Non-negative Matrix Factorization," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 10, Oct. 2013, pp. 2140-2151. https://doi.org/10.1109/TASL.2013.2270369
K.W. Wilson, B. Ray, and P. Smaragdis, "Regularized Non-negative Matrix Factorization with Temporal Dependencies for Speech Denoising," Proc. Interspeech, Jan. 2008, pp. 411-414.
E.J. Candes et al., "Robust Principle Component Analysis?," J. ACM, vol. 58, no. 3, 2011, pp. 11:1-11:37.
P. Huang et al., "Sing-Voice Separation from Monaural Recording Using Robust Principal Component Analysis," Proc. IEEE Conf. Acoustics, Speech, Signal, Process., Kyoto, Japan, Mar. 25-30, 2012, pp. 57-60.
Z. Chen and D.P.W. Eills, "Speech Enhancement by Sparse, Low-Rank, and Dictionary Spectrogram Decomposition," Proc. Workshop Appli. Signal Process. Audio Acoustics, New Paltz, NY, USA, Oct. 20-23, 2013, pp. 1-4.
C. Sun, Q. Zhu, and M. Wan, "A Novel Speech Enhancement Method Based on Constrained Low-Rank and Sparse Matrix Decomposition," Speech Commun., vol. 60, May 2014, pp. 44-55. https://doi.org/10.1016/j.specom.2014.03.002
L. Zhang et al., "Robust Non-negative Matrix Factorization," Frontiers Electric. Electron. Eng. China, vol. 6, no. 2, June 2011, pp. 192-200. https://doi.org/10.1007/s11460-011-0128-0
T. Virtanen et al., "Compositional Models for Audio Processing: Uncovering the Structure of Sound Mixtures," IEEE Signal Process. Mag., vol. 32, no. 2, Mar. 2015, pp. 125-144. https://doi.org/10.1109/MSP.2013.2288990
J.J. Carabias-Orti et al., "Constrained Non-negative Sparse Coding Using Learnt Instrument Templates for Real Time Music Transcription," Eng. Appli. Artificial Intell., vol. 26, no. 7, Aug. 2013, pp. 1671-1680. https://doi.org/10.1016/j.engappai.2013.03.010
C. Fevotte and J. Idier, "Algorithms for Nonnegative Matrix Factorization with the Beta-Divergence," Neural Comput., vol. 23, no. 9, Sept. 2011, pp. 2421-2456. https://doi.org/10.1162/NECO_a_00168
D.D. Lee and H.S. Seung, "Learning the Parts of Objects with Nonnegative Matrix Factorization," Nature, vol. 401, Oct. 1999, pp. 788-791. https://doi.org/10.1038/44565
H. Li, Y. Shen, and J. Wnag, "An Improved Multiplicative Updating Algorithm for Non-negative Independent Component Analysis," ETRI J., vol. 35, no. 2, Apr. 2013, pp. 193-199. https://doi.org/10.4218/etrij.13.0112.0224
M. Sun and H. Van Hamme, "Large Scale Graph Regularized Non-negative Matrix Factorization with $l_1$ Normalization Based on Kullback-Leibler Divergence," IEEE Trans. Signal Process., vol. 60, no. 7, July 2012, pp. 3876-3880. https://doi.org/10.1109/TSP.2012.2192113
V.Y.F. Tan, "Automatic Relevance Determination in Nonnegative Matrix Factorization with the ${\beta}$-Divergence," IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 7, July 2013, pp. 1592-1605. https://doi.org/10.1109/TPAMI.2012.240
P. Hoyer, "Non-negative Matrix Factorization with Sparseness Constraints," J. Mach. Learn. Res., vol. 5, 2004, pp. 1457-1469.
W. Wang, A. Cichocki, and J.A. Chamners, "A Multiplicative Algorithm for Convolutive Non-negative Matrix Factorization Based on Squared Euclidean Distance," IEEE Trans. Signal Process., vol. 57, no. 7, July 2009, pp. 2858-2864. https://doi.org/10.1109/TSP.2009.2016881
P. Smaragdis, "Convolutive Speech Bases and Their Application to Supervised Speech Separation," IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 1, Jan. 2007, pp. 1-12. https://doi.org/10.1109/TASL.2006.876726
D. Wang et al., "Online Non-negative Convolutive Pattern Learning for Speech Signals," IEEE Trans. Audio, Speech, Language Process., vol. 61, no. 1, Jan. 2013, pp. 44-56.
J. Huang et al., "Speech Denoising Via Low-Rank and Sparse Matrix Decomposition," ETRI J., vol. 36, no. 1, Feb. 2014, pp. 167-170. https://doi.org/10.4218/etrij.14.0213.0033
F.G. Germain and G.J. Mysore, "Speaker and Noise Independent Online Single-Channel Speech enhancement," Proc. IEEE Conf. Acoustics Speech Signal Process., Queensland, Australia, Apr. 19-24, 2015, pp. 71-75.
Y. Li et al., "Speech Enhancement Using Non-negative Matrix Low-Rank Modeling with Temporal Continuity and Sparseness Constraints," Proc. PCM, 2016, accepted.
M. Sun et al., "Speech Enhancement Under Low SNR Conditions Via Noise Estimation Using Sparse and Low-Rank NMF with Kullback-Leibler Divergence," IEEE Trans. Audio, Speech, Language Process., vol. 23, no. 7, July 2015, pp. 1233-1242. https://doi.org/10.1109/TASLP.2015.2427520
Y. Li et al., "Adaptive Extraction of Repeating Non-negative Temporal Patterns for Single Channel Speech Enhancement," Proc. IEEE Conf. Acoustics Speech Signal Process., Shanghai, China, Mar. 20-25, 2016, pp. 494-498.
Y. Li, et al., "Automatic Model Order Selection for Convolutive Non-negative Matrix Factorization," IEICE Trans. Fundam. Electron. Commun. Comput. Sci., vol. E99-A, no. 10, 2016, pp. 1867-1870. https://doi.org/10.1587/transfun.E99.A.1867
E. Vincent, R. Gribonval, and C. Fevotte, "Performance Measurement in Blind Audio Source Separation," IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 4, July 2006, pp. 1462-1469. https://doi.org/10.1109/TSA.2005.858005

Cited by

Orthogonal nonnegative matrix tri-factorization based on Tweedie distributions vol.13, pp.4, 2017, https://doi.org/10.1007/s11634-018-0348-8

ETRI Journal

Robust Non-negative Matrix Factorization with β-Divergence for Speech Separation

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)