DOI QR코드

DOI QR Code

비음수 텐서 분해 및 은닉 마코프 모델을 이용한 다음향 환경에서의 이중 채널 음향 사건 검출

Dual-Channel Acoustic Event Detection in Multisource Environments Using Nonnegative Tensor Factorization and Hidden Markov Model

  • 전광명 (광주과학기술원 전기전자컴퓨터공학부) ;
  • 김홍국 (광주과학기술원 전기전자컴퓨터공학부)
  • Jeon, Kwang Myung (School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology) ;
  • Kim, Hong Kook (School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology)
  • 투고 : 2016.09.19
  • 심사 : 2016.12.16
  • 발행 : 2017.01.25

초록

본 논문에서는 다음향(multisource) 환경에서의 음향 사건 검출 정확도를 높이기 위해 비음수 텐서 분해(nonnegative tensor factorization, NTF)와 은닉 마코프 모델(hidden Markov model, HMM)을 이용한 이중 채널 음향 사건 검출 방법을 제안한다. 제안된 방법은 먼저 이중 채널 입력 신호들에 NTF 기법을 적용하여 얻은 각 음향 사건 별 채널 이득을 활용하여 다수의 음향 사건들을 검출한다. 그러고 나서, 채널 이득에 의해 검출된 음향 사건의 발생 여부를 검증하기 위하여 채널 이득을 우도 가중치로 활용하는 HMM 기반의 우도비 검증을 수행한다. 제안된 방법의 검출 정확도를 평가하기 위하여 다양한 잡음과 사건간 중첩 밀도를 고려하는 다중 사건 발생 환경에 대한 F-measure를 측정하였고, 기존의 혼합 가우시안 모델 및 비음수 행렬 분해 기반의 음향 사건 검출 방법들과 비교하였다. 실험 결과, 제안된 방법이 기존 방법들에 비하여 모든 실험 조건에서 높은 정확도를 보였다.

In this paper, we propose a dual-channel acoustic event detection (AED) method using nonnegative tensor factorization (NTF) and hidden Markov model (HMM) in order to improve detection accuracy of AED in multisource environments. The proposed method first detects multiple acoustic events by utilizing channel gains obtained from the NTF technique applied to dual-channel input signals. After that, an HMM-based likelihood ratio test is carried out to verify the detected events by using channel gains. The detection accuracy of the proposed method is measured by F-measures under 9 different multisource conditions. Then, it is also compared with those of conventional AED methods such as Gaussian mixture model and nonnegative matrix factorization. It is shown from the experiments that the proposed method outperforms the convectional methods under all the multisource conditions.

키워드

참고문헌

  1. Y. Hong, S. Yu, C. M. Park, T. Yoon, and J. M. Kim, "Highway incident detection and classification algorithms using multi-channel CCTV", Journal of The Institute of Electronics and Information Engineers, Vol. 39, No. 2, pp. 263-269, Feb. 2014.
  2. J. F. Gemmeke, L. Vuegen, P. Karsmakers, and B. Vanrumste, "An exemplar-based NMF approach to audio event detection", in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), DOI: 10.1109/WASPAA.2013.6701847, New Paltz, New York, Oct. 2013.
  3. G. Valenzise, L. Gerosa, M. Tagliasacchi, F. Antonacci, and A. Sarti, "Scream and gunshot detection and localization for audio-surveillance systems", in Proc. of IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 21-26, London, UK, Sept. 2007.
  4. L. Vuegen, B. Van Den Broeck, P. Karsmakers, J. F. Gemmeke, B. Vanrumste, and H. Van Hamme, "An MFCC-GMM approach for event detection and classification", Tech. Rep., 2013 [Online]. Available: http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge/abstracts/OL/VVK.pdf.
  5. T. Heittola, A. Mesaros, T. Virtanen, and A. Eronen, "Sound event detection in multisource environments using source separation", in Proc. of Workshop on Machine Listening in Multisource Environments, pp. 36-40, Florence, Italy, Sept. 2011.
  6. K. M. Jeon, D. Y. Lee, H. K. Kim, and M. J. Lee, "Acoustic surveillance of hazardous situations using nonnegative matrix factorization and hidden Markov model", in Proc. of Audio Engineering Society (AES) 137th Convention, Preprint 9203, Los Angeles, CA, Oct. 2014.
  7. C. Clavel, T. Ehrette, and G. Richard, "Event detection for an audio-based surveillance system", in Proc. of IEEE International Conference on. Multimedia and Expo (ICME), pp. 1306-1309, Amsterdam, Netherlands, July, 2005.
  8. L. Atlas, G. Bernard, and S. Narayanan, "Applications of time-frequency analysis to signals from manufacturing and machine monitoring sensors", Proceedings of the IEEE, Vol. 84, No. 9, pp. 1319-1329, Sept. 1996. https://doi.org/10.1109/5.535250
  9. J. Pinquier, "Robust speech/music classification in audio documents", Entropy, Vol. 1, No. 2, pp. 2005-2008, Jan. 2002.
  10. T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen, "Context-dependent sound event detection", EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2013, No. 1, pp. 1-13, Dec. 2013. https://doi.org/10.1186/1687-4722-2013-1
  11. J. A. Smith, J. E. Earis, and A. A. Woodcock, "Establishing a gold standard for manual cough counting: video versus digital recordings", Cough, Vol. 2, no. 6, pp. 1-6, Aug. 2006. https://doi.org/10.1186/1745-9974-2-1
  12. Y. Tian, D. Lo, and C. Sun, "Information retrieval based nearest neighbor classification for fine-grained bug severity prediction", in Proc. of Working Conference on Reverse Engineering (WCRE), pp. 215-224, Ontario, Canada, Oct, 2012.
  13. X. Zhuang, X. Zhou, M. A. Hasegawa-Johnson, and T. S. Huang, "Real-world acoustic event detection", Pattern Recognition Letters, Vol. 31, No. 12, pp. 1543-1551, Sept. 2010. https://doi.org/10.1016/j.patrec.2010.02.005
  14. K. M. Jeon, H. K. Kim, S. J. Lee, and Y. K. Lee, "Nonnegative matrix factorization based adaptive noise sensing over wireless sensor networks", International Journal of Distributed Sensor Networks, Vol. 10, No. 4, Apr. 2014.
  15. Y. Mitsufuji, M. Liuni, A. Baker, and A. Roebel, "Online non-negative tensor deconvolution for source detection in 3DTV audio", in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3082-3086, Florence, Italy, Mar. 2014.
  16. S. Mirsamadi and J. H. L. Hansen, "Multichannel speech dereverberation based on convolutive nonnegative tensor factorization for ASR applications", in Proc. of Interspeech, Singapore, pp. 2828-2832, Sept. 2014.
  17. S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book Version 3.4. Cambridge University, Cambridge, 2006.
  18. A. Chichocki, R. Zdunek, A. H. Phan, and S. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation, John Wiley & Sons: Hoboken, NJ, 2009.
  19. K. M. Jeon, N. I. Park, H. K. Kim, M. K. Choi, and K. I. Hwang, "Mechanical noise suppression based on non-negative matrix factorization and multi-band spectral subtraction for digital cameras", IEEE Transactions on Consumer Electronics, Vol. 59, No. 2, pp. 296-302, May 2013. https://doi.org/10.1109/TCE.2013.6531109
  20. C. M. Bishop, Pattern Recognition and Machine Learning, Springer: New York, NY, 2006.
  21. K. M. Jeon, D. Y. Lee, N. I. Park, M. K. Choi, and H. K. Kim, "Two-stage impulsive noise detection using inter-frame correlation and hidden Markov model for audio restoration", in Proc. of Audio Engineering Society (AES) 136th Convention, Preprint 9036, Berlin, Germany, Apr. 2014.
  22. D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, "Detection and classification of acoustic scenes and events", IEEE Transactions on Multimedia, Vol. 17, no. 10, pp. 1733-1746, Oct. 2015. https://doi.org/10.1109/TMM.2015.2428998
  23. D. M. W. Powers, "Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation", Journal of Machine Learning Technologies, Vol. 2, No. 1, pp. 37-63, Dec. 2011.

피인용 문헌

  1. Instantaneous Incident Detection System Based on Analysis of Acoustic Signal from Crash and Skid in Tunnel vol.12, pp.None, 2018, https://doi.org/10.2174/1874447801812010344