DOI QR코드

DOI QR Code

Audio Source Separation Based on Residual Reprojection

  • Cho, Choongsang (Department of Imaging Engineering, the Graduate School of Advanced Imaging Science, Multimedia & Film at Chung-Ang University) ;
  • Kim, Je Woo (Department of Multimedia IP Research Center, Korea Electronics Technology Institute) ;
  • Lee, Sangkeun (Department of Imaging Engineering, the Graduate School of Advanced Imaging Science, Multimedia & Film at Chung-Ang University)
  • Received : 2014.11.12
  • Accepted : 2015.05.11
  • Published : 2015.08.01

Abstract

This paper describes an audio source separation that is based on nonnegative matrix factorization (NMF) and expectation maximization (EM). For stable and highperformance separation, an effective auxiliary source separation that extracts source residuals and reprojects them onto proper sources is proposed by taking into account an ambiguous region among sources and a source's refinement. Specifically, an additional NMF (model) is designed for the ambiguous region - whose elements are not easily represented by any existing or predefined NMFs of the sources. The residual signal can be extracted by inserting the aforementioned model into the NMF-EM-based audio separation. Then, it is refined by the weighted parameters of the separation and reprojected onto the separated sources. Experimental results demonstrate that the proposed scheme (outlined above) is more stable and outperforms existing algorithms by, on average, 4.4 dB in terms of the source distortion ratio.

Keywords

References

  1. H. Attias, "New EM Algorithm for Source Separation and Deconvolution with a Microphone Array," Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., Hong Kong, China, Apr. 6-10, 2003, pp. 297-300.
  2. C.J. Chun and H.K. Kim, "Sound Source Separation Using Interaural Intensity Difference in Real Environments," Proc. AES Convention, Oct. 2013.
  3. N.J. Bryan and G.J. Mysore, "Interactive Refinement of Supervised and Semi-supervised Sound Source Separation Estimates," IEEE Inter. Conf. Acoustics, Speech, Signal Process., Vancouver, Canada, May 26-31, 2013, pp. 883-887.
  4. C. Fevotte and C. Doncarli, "Two Contributions to Blind Source Separation Using Time-Frequency Distributions," IEEE Signal Process. Lett., vol. 11, no. 3, Mar. 2004, pp. 386-389. https://doi.org/10.1109/LSP.2003.819343
  5. G.-S. Fu et al., "Blind Source Separation by Entropy Rate Minimization," IEEE Trans. Signal Process., vol. 62, no. 16, June 2014, pp. 4245-4255. https://doi.org/10.1109/TSP.2014.2333563
  6. E. Vincent, "Complex Nonconvex lp Norm Minimization for Underdetermined Source Separation," Int. Conf. Ind. Compon. Anal., London, UK, Sept. 9-12, 2007, pp. 430-437.
  7. P. Smragdis et al., "Static and Dynamic Source Separation Using Nonnegative Factorizations: A Unified View," IEEE Signal Process. Mag., vol. 31, no. 3, May 2014, pp. 66-75. https://doi.org/10.1109/MSP.2013.2297715
  8. A. Ozerov and C. Fevotte, "Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation," IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 3, Mar. 2010, pp. 550-563. https://doi.org/10.1109/TASL.2009.2031510
  9. A. Ozerov, E. Vincent, and F. Bimbot, "A General Flexible Framework for the Handling of Prior Information in Audio Source Separation," IEEE Trans. Audio, Speech, Language Process., vol. 20, no. 4, May 2012, pp. 1118-1133. https://doi.org/10.1109/TASL.2011.2172425
  10. P. Smaragdis, "Convolutive Speech Bases and Their Application to Supervised Speech Separation," IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 1, Jan. 2007, pp. 1-12. https://doi.org/10.1109/TASL.2006.876726
  11. C. Fevotte, N. Bertin, and J.-L. Durrieu, "Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis," Neural Comput., vol. 21, no. 3, Mar. 2009, pp. 793-830. https://doi.org/10.1162/neco.2008.04-08-771
  12. T. Virtanen, "Monaural Sound Source Separation by Nonnegative Matrix Factorization with Temporal Continuity and Sparseness Criteria," IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 3, Mar. 2007, pp. 1066-1074. https://doi.org/10.1109/TASL.2006.885253
  13. D.D. Lee and H.S. Seung, "Learning the Parts of Objects by Nonnegative Matrix Factorization," Nature, vol. 401, Oct. 1999, pp. 788-791. https://doi.org/10.1038/44565
  14. A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum Likelihood from Incomplete Data via EM Algorithm," J. Royal Statistic Soc. Series B (Methodological), vol. 39, no. 1, 1977, pp. 1-38.
  15. T.K. Moon, "Mathematical Methods and Algorithms for Signal Processing," NJ, USA: Prentice Hall, 2009.
  16. C. Moler, "Numerical Computing with MATLAB," Electronic Edition, The Mathworks: Natick, MA, USA, 2004.
  17. Example Web Page. Accessed Nov. 11, 2014. http://www.irisa.fr/metiss/ozerov/demos.html#ieeetaslp09
  18. A. Ozerov, E. Vincent, and F. Bimbot, Flexible Audio Source Separation Toolbox (FASST) Version 1.0 User Guide. Accessed Nov. 11, 2014. http://bass-db.gforge.inria.fr/fasst/FASST_UserGuide_v1.pdf
  19. R. Bianchini and A. Cipriani, "Virtual Sound: Sound Synthesis and Signal Processing-Theory and Practice with Csound," Rome, Italy: ComTempo, 2000.
  20. E. Vincent, R. Gribonval, and C. Fevotte, "Performance Measurement in Blind Audio Source Separation," IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 4, July 2006, pp. 1462-1469. https://doi.org/10.1109/TSA.2005.858005
  21. E. Vincent et al., "First Stereo Audio Source Separation Evaluation Campaign: Data, Algorithms and Results," Int. Conf. Independent Compon. Anal. Signal Separation, London, UK, Sept. 9-12, 2007, pp. 552-559.