Time-domain Sound Event Detection Algorithm Using Deep Neural Network |
Kim, Bum-Jun
(Yonsei University)
Moon, Hyeongi (Yonsei University) Park, Sung-Wook (Gangneung-Wonju National University) Jeong, Youngho (ETRI) Park, Young-Cheol (Yonsei University) |
1 | Mesaros, A., Heittola, T, and Virtanen, T, "TUT database for acoustic scene classification and sound event detection," 2016 24th EUSIPCO, Hungary, Budapest, pp.1128-1132, August 2016. |
2 | E. Wold, T. Blum, D. Keislar, and J. Wheaten, "Content-based classification, search, and retrieval of audio," IEEE Multimedia, Vol.3, No.3, pp.27-36, 1996. DOI |
3 | DENG, Ltsc, et al. "Recent advances in deep learning for speech research at Microsoft," In ICASSP, Vol. 26, pp. 64, May 2013. |
4 | Mun, Seongkyu, et al. "Generative adversarial network based acoustic scene training set augmentation and selection using SVM hyper-plane," Proceeding of DCASE, pp.93-97, 2017. |
5 | Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, "Language modeling with gated convolutional networks," arXiv preprint arXiv preprint arXiv:1612.08083, 2016. |
6 | Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, "Identity mappings in deep residual networks," In European Conference on Computer Vision (ECCV). Springer, pp.630-645, 2016. |
7 | J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," arXiv preprint arXiv:1709.01507, 2017. |
8 | Hyeongi Moon, Joon Byun, Bum-Jun Kim, Shin-hyuk Jeon, Youngho Jeong, Young-cheol Park and Sung-wook Park, "End-to-end CRNN Architectures for Weakly Supervised Sound Event Detection," DCASE 2018 Challenge, Sep. 2018. |
9 | Tara N. Sainath, Ron J. Weiss, Andrew Senior, Kevin W. Wilson, Oriol Vinyals, "Learning the speech front-end with raw waveform CLDNNs," Procedding of INTERSPEECH, Germany, Dresden, September 2015. |
10 | Justin Salamon and Juhan Pablo Bello, "Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification," IEEE Signal Processing Letters, pp.279-283, 2017 |
11 | J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, "Audio set: An ontology and human-labeled dataset for audio events," Proceeding of ICASSP, USA, New Orleans, pp.776-780, March 2017. |
12 | Mesaros, Annamaria, Toni Heittola, and Tuomas Virtanen, "Metrics for polyphonic sound event detection," Applied Sciences, 6.6: 162, 2016. DOI |
13 | Romain Serizel, Nicolas Turpault, Hamid Eghbal-Zadeh, Ankit Parag Shah, "Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments," arXiv preprint arXiv:1807.10501, 2018. |
14 | Yong Xu, Qiuqiang Kong, Wenwu Wang and Mark D. Plumbley, "Large-scale weakly supervised audio classification using gated convolutional neural network," Proceeding of ICASSP, Canada, Calgary, pp.121-125, April 2018. |