DOI QR코드

DOI QR Code

Robust Non-negative Matrix Factorization with β-Divergence for Speech Separation

  • Li, Yinan (Lab of Intelligent Information Processing, PLA University of Science and Technology) ;
  • Zhang, Xiongwei (Lab of Intelligent Information Processing, PLA University of Science and Technology) ;
  • Sun, Meng (Lab of Intelligent Information Processing, PLA University of Science and Technology)
  • Received : 2016.02.05
  • Accepted : 2016.11.03
  • Published : 2017.02.01

Abstract

This paper addresses the problem of unsupervised speech separation based on robust non-negative matrix factorization (RNMF) with ${\beta}$-divergence, when neither speech nor noise training data is available beforehand. We propose a robust version of non-negative matrix factorization, inspired by the recently developed sparse and low-rank decomposition, in which the data matrix is decomposed into the sum of a low-rank matrix and a sparse matrix. Efficient multiplicative update rules to minimize the ${\beta}$-divergence-based cost function are derived. A convolutional extension of the proposed algorithm is also proposed, which considers the time dependency of the non-negative noise bases. Experimental speech separation results show that the proposed convolutional RNMF successfully separates the repeating time-varying spectral structures from the magnitude spectrum of the mixture, and does so without any prior training.

Keywords

References

  1. P. Smaragdis et al., "Static and Dynamic Source Separation Using Nonnegative Matrix Factorizations: a Unified View," IEEE Signal Process. Mag., vol. 31, no. 3, May 2014, pp. 66-75. https://doi.org/10.1109/MSP.2013.2297715
  2. T. Virtanen, "Monaural Sound Source Separation by Non-negative Matrix Factorization with Temporal Continuity and Sparseness Criteria," IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 3, Mar. 2007, pp. 1066-1074. https://doi.org/10.1109/TASL.2006.885253
  3. Z. Duan, G.J. Mysore, and P. Smaragdis, "Online PLCA for Real-Time Semi-supervised Source Separation," Proc. Latent Variable Anal. Signal Separation, Tel Aviv, Israel, Mar. 12-15, 2012, pp. 34-41.
  4. N. Mohammadiha, P. Smaragdis, and A. Leijon, "Supervised and Unsupervised Speech Enhancement Using Non-negative Matrix Factorization," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 10, Oct. 2013, pp. 2140-2151. https://doi.org/10.1109/TASL.2013.2270369
  5. K.W. Wilson, B. Ray, and P. Smaragdis, "Regularized Non-negative Matrix Factorization with Temporal Dependencies for Speech Denoising," Proc. Interspeech, Jan. 2008, pp. 411-414.
  6. E.J. Candes et al., "Robust Principle Component Analysis?," J. ACM, vol. 58, no. 3, 2011, pp. 11:1-11:37.
  7. P. Huang et al., "Sing-Voice Separation from Monaural Recording Using Robust Principal Component Analysis," Proc. IEEE Conf. Acoustics, Speech, Signal, Process., Kyoto, Japan, Mar. 25-30, 2012, pp. 57-60.
  8. Z. Chen and D.P.W. Eills, "Speech Enhancement by Sparse, Low-Rank, and Dictionary Spectrogram Decomposition," Proc. Workshop Appli. Signal Process. Audio Acoustics, New Paltz, NY, USA, Oct. 20-23, 2013, pp. 1-4.
  9. C. Sun, Q. Zhu, and M. Wan, "A Novel Speech Enhancement Method Based on Constrained Low-Rank and Sparse Matrix Decomposition," Speech Commun., vol. 60, May 2014, pp. 44-55. https://doi.org/10.1016/j.specom.2014.03.002
  10. L. Zhang et al., "Robust Non-negative Matrix Factorization," Frontiers Electric. Electron. Eng. China, vol. 6, no. 2, June 2011, pp. 192-200. https://doi.org/10.1007/s11460-011-0128-0
  11. T. Virtanen et al., "Compositional Models for Audio Processing: Uncovering the Structure of Sound Mixtures," IEEE Signal Process. Mag., vol. 32, no. 2, Mar. 2015, pp. 125-144. https://doi.org/10.1109/MSP.2013.2288990
  12. J.J. Carabias-Orti et al., "Constrained Non-negative Sparse Coding Using Learnt Instrument Templates for Real Time Music Transcription," Eng. Appli. Artificial Intell., vol. 26, no. 7, Aug. 2013, pp. 1671-1680. https://doi.org/10.1016/j.engappai.2013.03.010
  13. C. Fevotte and J. Idier, "Algorithms for Nonnegative Matrix Factorization with the Beta-Divergence," Neural Comput., vol. 23, no. 9, Sept. 2011, pp. 2421-2456. https://doi.org/10.1162/NECO_a_00168
  14. D.D. Lee and H.S. Seung, "Learning the Parts of Objects with Nonnegative Matrix Factorization," Nature, vol. 401, Oct. 1999, pp. 788-791. https://doi.org/10.1038/44565
  15. H. Li, Y. Shen, and J. Wnag, "An Improved Multiplicative Updating Algorithm for Non-negative Independent Component Analysis," ETRI J., vol. 35, no. 2, Apr. 2013, pp. 193-199. https://doi.org/10.4218/etrij.13.0112.0224
  16. M. Sun and H. Van Hamme, "Large Scale Graph Regularized Non-negative Matrix Factorization with $l_1$ Normalization Based on Kullback-Leibler Divergence," IEEE Trans. Signal Process., vol. 60, no. 7, July 2012, pp. 3876-3880. https://doi.org/10.1109/TSP.2012.2192113
  17. V.Y.F. Tan, "Automatic Relevance Determination in Nonnegative Matrix Factorization with the ${\beta}$-Divergence," IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 7, July 2013, pp. 1592-1605. https://doi.org/10.1109/TPAMI.2012.240
  18. P. Hoyer, "Non-negative Matrix Factorization with Sparseness Constraints," J. Mach. Learn. Res., vol. 5, 2004, pp. 1457-1469.
  19. W. Wang, A. Cichocki, and J.A. Chamners, "A Multiplicative Algorithm for Convolutive Non-negative Matrix Factorization Based on Squared Euclidean Distance," IEEE Trans. Signal Process., vol. 57, no. 7, July 2009, pp. 2858-2864. https://doi.org/10.1109/TSP.2009.2016881
  20. P. Smaragdis, "Convolutive Speech Bases and Their Application to Supervised Speech Separation," IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 1, Jan. 2007, pp. 1-12. https://doi.org/10.1109/TASL.2006.876726
  21. D. Wang et al., "Online Non-negative Convolutive Pattern Learning for Speech Signals," IEEE Trans. Audio, Speech, Language Process., vol. 61, no. 1, Jan. 2013, pp. 44-56.
  22. J. Huang et al., "Speech Denoising Via Low-Rank and Sparse Matrix Decomposition," ETRI J., vol. 36, no. 1, Feb. 2014, pp. 167-170. https://doi.org/10.4218/etrij.14.0213.0033
  23. F.G. Germain and G.J. Mysore, "Speaker and Noise Independent Online Single-Channel Speech enhancement," Proc. IEEE Conf. Acoustics Speech Signal Process., Queensland, Australia, Apr. 19-24, 2015, pp. 71-75.
  24. Y. Li et al., "Speech Enhancement Using Non-negative Matrix Low-Rank Modeling with Temporal Continuity and Sparseness Constraints," Proc. PCM, 2016, accepted.
  25. M. Sun et al., "Speech Enhancement Under Low SNR Conditions Via Noise Estimation Using Sparse and Low-Rank NMF with Kullback-Leibler Divergence," IEEE Trans. Audio, Speech, Language Process., vol. 23, no. 7, July 2015, pp. 1233-1242. https://doi.org/10.1109/TASLP.2015.2427520
  26. Y. Li et al., "Adaptive Extraction of Repeating Non-negative Temporal Patterns for Single Channel Speech Enhancement," Proc. IEEE Conf. Acoustics Speech Signal Process., Shanghai, China, Mar. 20-25, 2016, pp. 494-498.
  27. Y. Li, et al., "Automatic Model Order Selection for Convolutive Non-negative Matrix Factorization," IEICE Trans. Fundam. Electron. Commun. Comput. Sci., vol. E99-A, no. 10, 2016, pp. 1867-1870. https://doi.org/10.1587/transfun.E99.A.1867
  28. E. Vincent, R. Gribonval, and C. Fevotte, "Performance Measurement in Blind Audio Source Separation," IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 4, July 2006, pp. 1462-1469. https://doi.org/10.1109/TSA.2005.858005

Cited by

  1. Orthogonal nonnegative matrix tri-factorization based on Tweedie distributions vol.13, pp.4, 2017, https://doi.org/10.1007/s11634-018-0348-8