Fig. 1. Network structure of the proposed model.
Fig. 2. Confusion matrix for four-class classification of proposed model trained with clean data and tested with clean data (Table 3 clean train / clean test). Overall classification accuracy is 94.4 %.
Fig. 3. Confusion matrix for four-class classification of proposed model trained with noise data and tested with noise data with SNR 10 dB [Table 3 noise train / noise test (10 dB)]. Overall classification accuracy is 90.7 %.
Table 1. Data distribution of 4 classes. TS, CC, CH, and NS correspond to tire skidding, car crash, car horn, and normal sounds.
Table 2. Results of the two-class classification of the proposed model and baseline (the classification accuracy of normal sound is not reported in[2]).
Table 3. Classification results according to training and test data composition. TOT means overall classification accuracy.
References
- R. Banerjee, A. Sinha, and A. Saha, "Participatory sensing based traffic condition monitoring using horn detection," Proc. the 28th annual ACM symposium on applied computing, 567-569 (2013).
- P. Foggia, P. Foggia, N. Petkov, A. Saggese, N. Stisciuglio, and M. Vento, "Audio surveillance of roads: A system for detecting anomalous sounds," IEEE trans. of intelligent transportation systems 17, 279-288 (2016). https://doi.org/10.1109/TITS.2015.2470216
- M. Cristani, M. Bicego, and V. Murino, "Audio-visual event recognition in surveillance video sequences," IEEE Trans. Multimedia, 9, 257-267 (2007). https://doi.org/10.1109/TMM.2006.886263
- K. J. Piczak, "Environmental sound classification with convolutional neural networks," IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 1-6 (2015).
- J. Salamon and J. P. Bello, "Deep convolutional neural networks and data augmentation for environmental sound classification," IEEE Signal Processing Letters, 24, 279-283 (2017). https://doi.org/10.1109/LSP.2017.2657381
- J. Salamon, C. Jacoby, and J. P. Bello, "A dataset and taxonomy for urban sound research," Proc. the 22nd ACM international conference on Multimedia, 1041-1044 (2014).
- http://www.freesound.org
- B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and O. Nieto, "librosa: Audio and music signal analysis in python," Proc. the 14th Python in Science Conference, 18-25 (2015).
- M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, "Tensorflow: a system for large-scale machine learning," Proc. the 12th USENIX conference on OSDI, 16, 265-283 (2016).