DOI QR코드

DOI QR Code

멀티채널 비음수 행렬분해와 정규화된 공간 공분산 행렬을 이용한 미결정 블라인드 소스 분리

Underdetermined blind source separation using normalized spatial covariance matrix and multichannel nonnegative matrix factorization

  • 오순묵 (서울과학기술대학교 기계설계로봇공학과) ;
  • 김정한 (서울과학기술대학교 기계설계로봇공학과)
  • Oh, Son-Mook ;
  • Kim, Jung-Han (Department of Mechanical Design and Robot Engineering, Seoul National University of Science and Technology)
  • 투고 : 2019.12.31
  • 심사 : 2020.02.07
  • 발행 : 2020.03.31

초록

본 논문은 블라인드 소스 분리 분야에서 널리 사용되는 멀티채널 비음수 행렬 분해 기법의 단점을 개선하여 미결정 복잡한 혼합 환경에서 문제를 해결한다. 공간 공분산 행렬에 기반을 둔 기존의 연구들에서, 단일 채널의 파워게인 및 상관관계와 같은 값으로 구성된 행렬의 각 요소는 높은 분산으로 인해 분리된 소스의 품질을 저하시키는 경향이 있다. 이 논문에서는 추정된 소스들을 효과적으로 클러스터링하기 위해 레벨 및 주파수 정규화를 수행한다. 따라서 새로운 공간 공분산 행렬 및 효과적인 클러스터 쌍별 거리함수를 제안한다. 본 논문에서는 제안된 행렬을 공간 모델의 초기화에 활용하여 공간 모델의 향상된 추정과 이를 바탕으로 상향식 접근법에서의 계층적 응집 클러스터링에 활용함으로써 분리된 음원의 품질을 향상시켰다. 제안된 알고리즘은 'Signal Separation Evaluation Campaign 2008 development dataset'을 활용하여 실험을 하였다. 그 결과 객관적인 소스 분리 품질 검증 도구인 'Blind Source Separation Eval toolbox'를 활용하여 대부분의 성능향상지표에서의 향상을 확인하였으며, 특히 대표적인 수치인 SDR의 1 dB ~ 3.5 dB 정도의 성능우위를 검증하였다.

This paper solves the problem in underdetermined convolutive mixture by improving the disadvantages of the multichannel nonnegative matrix factorization technique widely used in blind source separation. In conventional researches based on Spatial Covariance Matrix (SCM), each element composed of values such as power gain of single channel and correlation tends to degrade the quality of the separated sources due to high variance. In this paper, level and frequency normalization is performed to effectively cluster the estimated sources. Therefore, we propose a novel SCM and an effective distance function for cluster pairs. In this paper, the proposed SCM is used for the initialization of the spatial model and used for hierarchical agglomerative clustering in the bottom-up approach. The proposed algorithm was experimented using the 'Signal Separation Evaluation Campaign 2008 development dataset'. As a result, the improvement in most of the performance indicators was confirmed by utilizing the 'Blind Source Separation Eval toolbox', an objective source separation quality verification tool, and especially the performance superiority of the typical SDR of 1 dB to 3.5 dB was verified.

키워드

참고문헌

  1. S. U. N. Wood, J. Rouat, S. Dupont, and G. Pironkov, "Blind speech separation and enhancement with GCCNMF," in IEEE/ACM Trans. Audio, Speech, and Lang. Process. 25, 745-755 (2017). https://doi.org/10.1109/TASLP.2017.2656805
  2. D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, "Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization," in IEEE/ACM Trans. Audio, Speech, and Lang. Process. 24, 1626-1641 (2016). https://doi.org/10.1109/TASLP.2016.2577880
  3. H. Sawada, N. Ono, H. Kameoka, D. Kitamura, and H. Saruwatari, "A review of blind source separation methods: two converging routes to ILRMA originating from ICA and NMF," APSIPA Trans. Signal and Inf. Process. 8, 1-14 (2019). https://doi.org/10.1017/ATSIP.2018.27
  4. D. L. Wang and J. Cheng, "Supervised speech separation based on deep learning: An overview," IEEE/ ACM Trans. Audio, Speech, Lang. Process. 26, 1702-1726 (2018). https://doi.org/10.1109/TASLP.2018.2842159
  5. K. Sekiguchi, A. A. Nugraha, Y. Bando, and K. Yoshii, "Fast multichannel source separation based on jointly diagonalizable spatial covariance matrices," Proc. Eur. Signal Process. Conf. 1-5 (2019).
  6. T. Lee, Independent Component Analysis-Theory and Applications (Springer US, Boston, 1998), pp. 27-107.
  7. N. Ono and S. Miyabe, "Auxiliary-function-based independent component analysis for super-Gaussian sources," Proc. Int. Conf. Latent Variable Anal. Signal Separation, 165-172 (2010).
  8. O. Yilmaz and S. Rickard, "Blind separation of speech mixtures via time-frequency masking," IEEE Trans. Signal Process. 52, 1830-1847 (2004). https://doi.org/10.1109/TSP.2004.828896
  9. M. I. Mandel, R. J. Weiss, and D. P. W. Ellis, "Modelbased expec-tation maximization source separation and localization," IEEE Trans. Audio, Speech, Lang. Process. 18, 382-394 (2010). https://doi.org/10.1109/TASL.2009.2029711
  10. H. Sawada, H. Kameoka, S. Araki, and N. Ueda, "Multichannel extensions of non-negative matrix factorization with complex-valued data," in IEEE Trans. Audio, Speech, and Lang. Process. 21, 971-982 (2013). https://doi.org/10.1109/TASL.2013.2239990
  11. D. Kitamura, H. Saruwatari, H. Kameoka, Y. Takahashi, K. Kondo, and S. Nakamura, "Multichannel signal separation combining directional clustering and nonnegative matrix factorization with spectrogram restoration," in IEEE/ACM Trans. Audio, Speech, and Lang. Process. 23, 654-669 (2015). https://doi.org/10.1109/TASLP.2015.2401425
  12. S. Araki, H. Sawada, R. Mukai, and S. Makino, "Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors," Signal Process. 87, 1833-1847 (2007). https://doi.org/10.1016/j.sigpro.2007.02.003
  13. A. Ozerov and C. Fevotte, "Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation," IEEE Trans. Audio, Speech, Lang. Process. 18, 550-563 (2010). https://doi.org/10.1109/TASL.2009.2031510
  14. J. J. Carabias-Orti, J. Nikunen, T. Virtanen, and P. Vera-Candeas, "Multichannel blind sound source separation using spatial covariance model with level and time differences and nonnegative matrix factorization," in IEEE/ACM Trans. Audio, Speech, and Lang. Process. 26, 1512-1527 (2018). https://doi.org/10.1109/TASLP.2018.2830105
  15. E. Vincent, R. Gribonval, and C. Fevotte, "Performance measurement in blind audio source separation," in IEEE Trans. Audio, Speech, and Lang. Process. 14, 1462-1469 (2006). https://doi.org/10.1109/TSA.2005.858005
  16. E. Vincent, S. Araki, F. Theis, G. Nolte, P. Bofill, H. Sawada, A. Ozerov, V. Gowreesunker, D. Lutter, and N. Q. K. Duong, "The signal separation evaluation campaign (2007-2010): achievements and remaining challenges," Signal Process. 92, 1928-1936 (2012). https://doi.org/10.1016/j.sigpro.2011.10.007
  17. H. Sawada, S. Araki, and S. Makino, "Underdetermined convolutive blind source separation via frequency binwise clustering and permutation alignment," in IEEE Trans. Audio, Speech, and Lang. Process. 19, 516-527 (2011). https://doi.org/10.1109/TASL.2010.2051355
  18. N. Ito and T. Nakatani, "FastMNMF: Joint diagonalization based accelerated algorithms for multichannel nonnegative matrix factorization," Proc. ICASSP, 371-375 (2019).