Audio Event Classification Using Deep Neural Networks |
Lim, Minkyu
(서강대학교)
Lee, Donghyun (서강대학교) Kim, Kwang-Ho (서강대학교) Kim, Ji-Hwan (서강대학교) |
1 | Lu, L., Jiang, H., & Zhang, H. (2001). A robust audio classification and segmentation method, in Proc. ACM International Conference on Multimedia, 203-211. |
2 | Xu, M., et al. (2003). Creating audio keywords for event detection in soccer video, in Proc. IEEE International Conference on Multimedia and Expo, 281-284. |
3 | Cheng, W., Chu, W., and Wu, J. (2003). Semantic context detection based on hierarchical audio models, in Proc. ACM SIGMM International Workshop on Multimedia Information Retrieval, 109-115. |
4 | Elo, J. P., et al. (2009). Non-speech audio event detection, in Proc. Internationa Conference on Acoustics, Speech and Signal Processing, 1973-1976. |
5 | Heittola, T., et al. (2013). Context-dependent sound event detection, EURASIP Journal on Audio, Speech, and Music Processing, 11-13. |
6 | Lee, H., Pham, P., Largman, Y., & Ng, A. Y. (2009). Unsupervised feature learning for audio classification using convolutional deep belief networks. in Proc. Advances in Neural Information Processing Systems, 1096-1104. |
7 | K, Zvi., & T, Orith. (2013). Audio event classification using deep neural networks, in Proc. INTERSPEECH, 1482-1486. |
8 | Ballan, L., et al. (2009). Deep networks for audio event classification in soccer videos, in Proc. International Conference on Multimedia and Expo, 474-477. |
9 | Bengio, Y. & LeCun, Y. (2007). Scaling learning algorithms towards AI, Large-scale Kernel Machines, Vol. 34, No.5, 321-360. |
10 | Barker, J., et al. (2012). The PASCAL CHiME speech separation and recognition challenge, Computer Speech & Language, Vol. 27, No. 3, 621-633. DOI |
11 | Downie, S., et al. (2010). The Music Information Retrieval Evaluation eXchange: Some observations and insights, Advances in Music Information Retrieval. Springer, 93-115. |
12 | Malkin, R. G. (2007). Multimodal Technologies for Perception of Humans. Springer, 323-330. |
13 | Smeaton, F. et al. (2006). Evaluation campaigns and TRECVid, in Proc. ACM International Workshop on Multimedia Information Retrieval, 321-330. |
14 | Vincen, E., et al. (2012). The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges, Signal Processing, Vol. 92, No. 8, 1928-1936. DOI |
15 | Larochelle, H., et al. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. in Proc. International Conference on Machine learning, 473-480. |
16 | Young, S., et al. (1999). The HTK Book. Cambridge, U.K.: Entropic. |
17 | Dahl, G. E., Sainath, T. N., & Hinton, G. E. (2013). Improving deep neural networks for LVCSR using rectified linear units and dropout, in Proc. International Conference on Acoustics, Speech and Signal Processing, 8609-8613. |
18 | Bottou, L. (2004). Advanced Lectures on Machine Learning, Sringer, 146-168. |
19 | Salamon, J., Jacoby, C., & Bello, J. P. (2014), A dataset and taxonomy for urban sound research, in Proc. ACM International Conference on Multimedia, 1041-1044. |
20 | Bergstra, J., et al. (2010). Theano: A CPU and GPU math expression compiler. in Proc. Python for Scientific Computing Conference, Vol. 4, p. 3. |