Comparison of Audio Event Detection Performance using DNN

Chung, Suk-Hwan;Chung, Yong-Joo;

doi:10.13067/JKIECS.2018.13.3.571

The Journal of the Korea institute of electronic communication sciences (한국전자통신학회논문지)

Volume 13 Issue 3
/
Pages.571-578
/
2018
/
1975-8170(pISSN)

Korea Institute of Electronic Communication Science (한국전자통신학회)

DOI QR Code

Comparison of Audio Event Detection Performance using DNN

DNN을 이용한 오디오 이벤트 검출 성능 비교

Chung, Suk-Hwan ;
Chung, Yong-Joo (Dept. Electronic Engineering, Keimyung University)

정석환 (계명대학교 전기전자융합시스템공학과) ;
정용주 (계명대학교 전자공학과)

Received : 2018.05.22
Accepted : 2018.06.15
Published : 2018.06.30

https://doi.org/10.13067/JKIECS.2018.13.3.571 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Recently, deep learning techniques have shown superior performance in various kinds of pattern recognition. However, there have been some arguments whether the DNN performs better than the conventional machine learning techniques when classification experiments are done using a small amount of training data. In this study, we compared the performance of the conventional GMM and SVM with DNN, a kind of deep learning techniques, in audio event detection. When tested on the same data, DNN has shown superior overall performance but SVM was better than DNN in segment-based F-score.

최근 딥러닝 기법이 다양한 종류의 패턴 인식에 있어서 우수한 성능을 보이고 있다. 하지만 소규모의 훈련데이터를 이용한 분류 실험에 있어서 전통적으로 사용되던 머신러닝 기법에 비해서 DNN의 성능이 우수한지에 대해서는 다소 간의 논란이 있어 왔다. 본 연구에서는 오디오 검출에 있어서 전통적으로 사용되어 왔던 GMM, SVM의 성능과 DNN의 성능을 비교하였다. 동일한 데이터에 대해서 인식실험을 수행한 결과, 전반적인 성능은 DNN이 우수하였으나 세그먼트 기반의 F-score에서 SVM이 DNN에 비해 우수한 성능을 보임을 알 수 있었다.

Keywords

References

L. Gerosa, G. Valenzise, M. Tagliasacchi, F. Antonacci, and A. Sarti, "Scream and Gunshot Detection in Noisy Environments," In Proc. the IEEE Conf. on Signal Processing, Poznan, Poland, Sept. 2007.
J. Park, J. Lim, J. Yang, J. Kyung, and M. Hahn, "False Positive Movie Clip Decision in Black-box Using Car Door-Closing Sound Classification," In Proc. the Institute of Electronics Engineers of Korea, vol. 2014, no. 6, 2014, pp. 761-763.
W. Huang, T. Chiew, H. Li, T. Kok, and J. Biswas, "Scream detection for home applications," In Proc. the IEEE Conf. on Industrial Electronics and Applications, Taichung, Taiwan, June 2010.
S. Oh, J. Uee, H. Lee, Y. Chung, and D. Park, "Abnormal Sound Detection and Identification in Surveillance System," J. of Korean Institute of Information Scientists and Engineers, vol. 39, no. 2, 2012, pp. 144-152.
M. Lim, D. Kim, K. Kim, and J. Kim, "Audio Event Classification Using Deep Neural Networks," J. of the Korean Society of Speech Sciences, vol. 7, no. 4, 2015, pp. 27-33.
D. Wei, J. Li, P. Pham, S. Das, and Shuhui Qu, Florian Metze, "Sound Event Detection for Real Life Audio DCASE Challenge," In Proc. European Signal Processing Conf. on Detection and Classification of Acoustic Scenes and Events, Budapest, Hungary, Sept. 2016.
Q. Kong and I. Sobieraj, W. Wang and M. Plumbley, "Deep Neural Network Baseline for DCASE Challenge 2016," In Proc. European Signal Processing Conf. on Detection and Classification of Acoustic Scenes and Events, Budapest, Hungary, Sept. 2016.
S. Bang, "Implementation of Image based Fire Detection System Using Convolution Neural Network," J. of the Korea Institute of Electronic Communication Sciences, vol. 12, no. 2, 2017, pp. 331-336. https://doi.org/10.13067/JKIECS.2017.12.2.331
S. Lim and D. Kim, "Semantic Segmentation using Convolutional Neural Network with Conditional Random Field," J. of the Korea Institute of Electronic Communication Sciences, vol. 12, no. 3, 2017, pp. 451-456. https://doi.org/10.13067/JKIECS.2017.12.3.451
E. Cakir, G. Parascandolo, T. Heittola, H. Huttunen, and T. Virtanen, "Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection," EEE/ACM Trans. Audio, Speech, and Language Processing, vol. 25, no. 6, 2017, pp. 1291-1303. https://doi.org/10.1109/TASLP.2017.2690575
A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen, "DCASE 2017 Challenge setup: Tasks, datasets and baseline system" In Proc. DCASE 2017 - Workshop on Detection and Classification of Acoustic Scenes and Events, Munich, Germany, Nov. 2017.
Y. Lee and P. Moon, "A Comparison and Analysis of Deep Learning Framework," J. of the Korea Institute of Electronic Communication Sciences, vol. 12, no. 1, 2017, pp. 115-122. https://doi.org/10.13067/JKIECS.2017.12.1.115
A. Mesaros, T. Heittola, and T. Virtanen, "Metrics for polyphonic sound event detection," Applied Sciences, vol. 6, no. 6, 2016, pp. 321-337 https://doi.org/10.3390/app6110321