[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2018.06.017

Convolutional Neural Network based Audio Event Classification

Lim, Minkyu (Dept. of Computer Science and Engineering, Sogang University)
Lee, Donghyun (Dept. of Computer Science and Engineering, Sogang University)
Park, Hosung (Dept. of Computer Science and Engineering, Sogang University)
Kang, Yoseb (Dept. of Computer Science and Engineering, Sogang University)
Oh, Junseok (Dept. of Computer Science and Engineering, Sogang University)
Park, Jeong-Sik (Dept. of English Linguistics & Language Technology, Hankuk University of Foreign Studies)
Jang, Gil-Jin (School of Electronics Engineering, Kyungpook National University)
Kim, Ji-Hwan (Dept. of Computer Science and Engineering, Sogang University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.12, no.6, 2018 , pp. 2748-2760 More about this Journal

Abstract

This paper proposes an audio event classification method based on convolutional neural networks (CNNs). CNN has great advantages of distinguishing complex shapes of image. Proposed system uses the features of audio sound as an input image of CNN. Mel scale filter bank features are extracted from each frame, then the features are concatenated over 40 consecutive frames and as a result, the concatenated frames are regarded as an input image. The output layer of CNN generates probabilities of audio event (e.g. dogs bark, siren, forest). The event probabilities for all images in an audio segment are accumulated, then the audio event having the highest accumulated probability is determined to be the classification result. This proposed method classified thirty audio events with the accuracy of 81.5% for the UrbanSound8K, BBC Sound FX, DCASE2016, and FREESOUND dataset.

Keywords

Audio event classification; Convolutional neural networks; Deep learning;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	K. Kim and H. Kim, "Storytelling Strategy of Visual-Image Contents base on Rhetoric Metaphors," Journal of Digital Content Society, vol. 14, no. 4, pp. 481-491, December, 2013. DOI
2	L. Lu, H. Jiang and H. Zhang, "A robust audio classification and segmentation method," in Proc. of ACM International Conference on Multimedia, pp. 203-211, September 30-October 5, 2001.
3	M. Xu, N. Maddage, C. Xu, M. Kankanhalli and Q. Tian, "Creating audio keywords for event detection in soccer video," in Proc. of IEEE International Conference on Multimedia and Expo, pp.281-284, July 6-9, 2003.
4	W. Cheng, W. Chu and J. Wu, "Semantic context detection based on hierarchical audio models," in Proc. of ACM SIGMM International Workshop on Multimedia Information Retrieval, pp.109-115, November 7-7, 2003.
5	H. Lee, P. Pham, Y. Largman and Y. Ng, "Unsupervised feature learning for audio classification using convolutional deep belief networks," in Proc. of Advances in Neural Information Processing Systems, pp.1096-1104, December 7-10, 2009.
6	Y. Bengio and Y. LeCun, "Large-scale Kernel Machines," MIT Press, 2007.
7	K. Zvi and T. Orith, "Audio event classification using deep neural networks," in Proc. of Interspeech, pp.1482-1486, August 25-29, 2013.
8	J. Portelo, M. Bugalho, I. Trancoso, J. Neto, A. Abad and A. Serralheiro, "Non-speech audio event detection," in Proc. of Internationa Conference on Acoustics, Speech and Signal Processing, pp.1973-1976, April 19-24, 2009.
9	L. Ballan, A. Bazzica and M. Bertini, A. Bimbo, and G. Serra, "Deep networks for audio event classification in soccer videos," in Proc. of International Conference on Multimedia and Expo, pp.474-477, June 28-3, 2009.
10	T. Heittola, A. Mesaros, A. Eronen and T. Virtanen, "Context-dependent sound event detection," EURASIP Journal on Audio, Speech, and Music Processing, vol.1, pp.1-13, January, 2013.
11	S. Downie, et al., "The Music Information Retrieval Evaluation eXchange: Some observations and insights," Advances in Music Information Retrieval, pp. 93-115, 2010.
12	R. Malkin, "Multimodal Technologies for Perception of Humans," Springer, pp. 323-330, 2007.
13	M. Lim and J. Kim, "Audio Event Classification Using Deep Neural Networks," Phonetics and Speech Sciences, vol. 7, no. 4, pp.27-33, January, 2015. DOI
14	F. Smeaton, et al., "Evaluation campaigns and TRECVid," in Proc. of ACM International Workshop on Multimedia Information Retrieval, pp. 321-330, 2006.
15	E. Vincent, et al., "The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges," Signal Processing, vol. 82, no. 8, pp. 1928-1936, 2012.
16	H. Larochelle, et al., "An empirical evaluation of deep architectures on problems with many factors of variation," in Proc. of International Conference on Machine Learning, pp.473-480, 2007.
17	J. Salamon, C. Jacoby and J. Bello, "A dataset and taxonomy for urban sound research," in Proc. of ACM International Conference on Multimedia, pp.1041-1044, November 3-7, 2014.
18	M. Slaney, "Semantic-audio retrieval," in Proc. of International Conference on Acoustics, Speech and Signal Processing, pp.1408-1411, May 13-17, 2002.
19	A. Mesaros, T. Heittola, and T. Virtanen, "TUT database for acoustic scene classification and sound event detection," in Proc. of 24th European Signal Processing Conference, pp. 1128-1132, 2016.
20	S. Young, G. Evermann, M. Gales and P. Woodland, "The HTK book (for HTK version 3.4)," Entropic Cambridge Research Laboratory, 2006.
21	M. Abadi, A. Agarwal, et al, "Tensorflow: Large-scale machine learning on heterogeneous distributed systems," 2016, Preprint at.
22	Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, pp.436-444, May, 2015. DOI

5	(2018) 韓國컴퓨터情報學會論文誌 Intelligent User Pattern Recognition based on Vision, Audio and Activity for Abnormal Event Detections of Single Households / 24 (5) , 59
3	(2020) KSII Transactions on internet and information systems : TIIS Oil Pipeline Weld Defect Identification System Based on Convolutional Neural Network / 14 (3) , 1086
7	(2018) KSII Transactions on internet and information systems : TIIS Speaker Adaptation Using i-Vector Based Clustering / 14 (7) , 2785
3	(2018) Informatics Improving Smart Cities Safety Using Sound Events Detection Based on Deep Neural Network Algorithms / 7 (3) , 23