DOI QR코드

DOI QR Code

Overlapping Sound Event Detection Using NMF with K-SVD Based Dictionary Learning

K-SVD 기반 사전 훈련과 비음수 행렬 분해 기법을 이용한 중첩음향이벤트 검출

  • 최현식 (고려대학교 전기전자공학부) ;
  • 금민석 (고려대학교 전기전자공학부) ;
  • 고한석 (고려대학교 전기전자공학부)
  • Received : 2015.01.23
  • Accepted : 2015.04.30
  • Published : 2015.05.31

Abstract

Non-Negative Matrix Factorization (NMF) is a method for updating dictionary and gain in alternating manner. Due to ease of implementation and intuitive interpretation, NMF is widely used to detect and separate overlapping sound events. However, NMF that utilizes non-negativity constraints generates parts-based representation and this distinct property leads to a dictionary containing fragmented acoustic events. As a result, the presence of shared basis results in performance degradation in both separation and detection tasks of overlapping sound events. In this paper, we propose a new method that utilizes K-Singular Value Decomposition (K-SVD) based dictionary to address and mitigate the part-based representation issue during the dictionary learning step. Subsequently, we calculate the gain using NMF in sound event detection step. We evaluate and confirm that overlapping sound event detection performance of the proposed method is better than the conventional method that utilizes NMF based dictionary.

비음수 행렬 분해(Nonnegative Matrix Factorization, NMF) 기법은 사전행렬과 크기성분을 번갈아 가며 업데이트 하면서 구하는 방법이며 직관적 해석 및 구현의 용이성으로 인해 중첩음향이벤트 분리 및 검출방법으로 널리 활용되었다. 하지만 비음수 행렬 분해의 고유한 특성인 부분기반표현(part-based representation)으로 인해 하나의 음향 이벤트를 구성 하는 사전(dictionary)의 파편화 현상이 발생하고, 다른 음향이벤트와 중복되는 사전이 생성되어 결과적으로 분리, 검출 성능의 저하 문제가 발생한다. 본 논문에서는 사전 획득 단계의 부분기반표현에 의한 문제를 해소하기 위해 K-Singular Value Decomposition(K-SVD)을 사용하여 사전을 획득하고, 음향이벤트 검출 단계 에서는 기존 비음수 행렬 분해 기법을 이용하여 크기를 획득 한다. 제안하는 방식을 통해 비음수 행렬 분해 기반의 사전을 사용하는 경우보다 중첩음향이벤트 검출 성능이 개선되는 것을 확인하였다.

Keywords

References

  1. I. Tosic and P. Frossard, "Dictionary learning," IEEE Signal Process. Mag. 28, 27-38 (2011). https://doi.org/10.1109/MSP.2010.939537
  2. A. Dessein, "Incremental multi-source recognition with non-negative matrix factorization," http://articles.ircam.fr/ textes/Dessein09b/index.pdf, 2009.
  3. J. F. Gemmeke, L.Vuegen, P. Karsmakers, and B.Vanrumste, "An exemplar-based NMF approach to audio event detection," In Applications of Signal Processing to Audio and Acoustics (WASPAA), 2013 IEEE Workshop on, 1-4 (2013).
  4. D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization," in Proc. of NIPS, 556-562 (2001).
  5. T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen, "Contextdependent sound event detection." EURASIP Journal on Audio, Speech, and Music Processing, 1, 1-13 (2013).
  6. S. Zhong, "Efficient online spherical k-means clustering". In Proceedings of IEEE Int. Joint Conf. Neural Networks (IJCNN 2005), 5, 3180-3185 (2005).
  7. M. Aharon, M. Elad, and A. Bruckstein, "K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation." Signal Processing, IEEE Transactions on, 54, 4311-4322 (2006). https://doi.org/10.1109/TSP.2006.881199
  8. M. Sabin and R. Gray, "Product code vector quantizers for waveform and voice coding," IEEE Trans. on Acoust., Speech and Signal Proc. 32, 474-488 (1984).
  9. S. S. Chen, D. L. Donoho, and M. A. Saunders, "Atomic decomposition by basis pursuit," SIAM Rev. 43, 129-159 (2001). https://doi.org/10.1137/S003614450037906X
  10. Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, "Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition," in Proc. 27th Annu. Asilomar Conf. Signals, Systems and Computers, 40-44, (1993).
  11. D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange, and M. Plumbley, "Detection and Classification of Acoustic Scenes and Events," Tech. Rep., an IEEE AASP Challenge, 2013.