References
- P. K. Atrey, N. C. Maddage, and M. S. Kankanhlli, "Audio based event detection for multimedia surveillance," Proc. IEEE ICASSP. 5. V-V (2006).
- J. Maxime, X. Alameda-Pineda, L. Girin, and R. Horaud, "Sound representation and classification benchmark for domestic robots," Proc. IEEE ICRA. 6285-6292 (2014).
- D. Stowell, M. Wood, Y. Stylianou, and H. Glotin, "Bird detection in audio: a survey and a challenge," Proc. IEEE 26th MLSP. 1-6 (2016).
- D. Stowell and M. D. Plumbley, "Audio-only bird classification using unsupervised feature learning," Proc. CLEF. 673-684 (2014).
- K. Ko, J. Park, D. K. Han, and H. Ko, "Channel and frequency attention module for diverse animal sound classification," IEICE Trans. on Information and Systems, E102-D, 2615-2618 (2019). https://doi.org/10.1587/transinf.2019EDL8128
- S. Park, M. Elhilali, D. K. Han, and H. Ko, "Amphibian sounds generating network based on adversarial learning," IEEE Signal Processing Letters, 27, 640-644 (2020). https://doi.org/10.1109/LSP.2020.2988199
- K. Ko, S. Park, and H. Ko, "Convolutional neural netework based amphibian sound classification using covariance and modulogram" (in Korean), J. Acoust. Soc. Kr. 37, 61-65 (2018).
- D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, "Detection and classification of acoustic scenes and events," IEEE Trans. Multimedia, 17, 1733-1746 (2015). https://doi.org/10.1109/TMM.2015.2428998
- G. Parascandolo, H. Huttunen, and T. Virtanen, "Recurrent neural networks for polyphonic sound event detection in real life recordings," Proc. IEEE ICASSP. 6440-6444 (2016).
- A. Mesaros, T. Heittola, and T. Virtanen, "TUT database for acoustic scene classification and sound event detection," Proc. 24th EUSIPCO. 1128-1132 (2016).
- S.-Y. Chou, J.-S. R. Jang, and Y.-H. Yang, "Frame CNN: A weakly-supervised learning framework for frame-wise acoustic event detetion and classification," DACSE. Tech. Rep., 2017.
- A. Kumar and B. Raj, "Deep cnn framework for audio event recognition using weak labeled web data," arXiv: 1707.02530 (2017).
- Y. Xu, Q. Kong, W. Wang, and M. D. Plumbley, "Large-scale weakly supervised audio classification using gated convolutional neural network," Proc. IEEE ICASSP. 121-125 (2018).
- Q. Kong, Y. Xu, I. Sobieraj, W. Wang, and M. D. Plumbley, "Sound event detection and time-frequency segmentation from weak labelled data," IEEE/ACM Trans. on Audio, Speech, And Lang. Processing, 27, 777-787 (2019). https://doi.org/10.1109/TASLP.2019.2895254
- K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv: 1409.1556 (2014).
- K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proc. IEEE CVPR. 770-778 (2016).
- Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, "Language modeling with gated convolutional networks," Proc. PMLR. 70, 933-941 (2017).
- Y. Chen, Q. Guo, X. Liang, J. Wang, and Y. Qian, "Environmental sound classification with dilated convolutions," Applied Acoustics, 148, 123-132 (2019). https://doi.org/10.1016/j.apacoust.2018.12.019
- J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello, "SCAPER:a library for soundscape synthesis and augmentation," Proc. IEEE WASPAA. 344-348 (2017).
- A. Kolesnikov and C. H. Lampert, "Seed, expand and constrain: Three principles for weakly-supervised image segmentation," Proc. ECCV. 695-711 (2016).
- Q. Kong, T. Iqbal, Y. Xu, W. Wang, and M. D. Plumbley, "DCASE 2018 challenge baseline with convolutional neural networks," DACSE. Tech. Rep., 2018.
- K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Tuda, and K. Takeda, "Weakly-supervised sound event detection with self-attention," Proc. IEEE ICASSP. 66-70 (2020).
- Y. Li, M. Liu, K. Drossos, and T. Virtanen, "Sound event detection via dilated convolutional recurrent neural networks," Proc. IEEE ICASSP. 286-290 (2020).
- D. Kingma and J. Ba, "Adam: a method for stochastic optimization," arXiv:1412.6980 (2015).
- S. Ioffe and C. Szegedy, "Batch normalization: accelerating deep network training by reducing internal covariate shift," Proc. 32nd ICML. 448-456 (2015).
- J. A. Hanley and B. J. McNeil, "The meaning and use of the area under a receiver operating characteristic (ROC) curve," Radiology, 431, 29-36 (1982).
- R. Girshich, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," Proc. IEEE CVPR. 580-587 (2014).