[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7236/IJASC.2020.9.2.20

CNN based Sound Event Detection Method using NMF Preprocessing in Background Noise Environment

Jang, Bumsuk (BS SOFT Co., LTD.)
Lee, Sang-Hyun (Department of Computer Engineering, Honam University)

Publication Information

International journal of advanced smart convergence / v.9, no.2, 2020 , pp. 20-27 More about this Journal

Abstract

Sound event detection in real-world environments suffers from the interference of non-stationary and time-varying noise. This paper presents an adaptive noise reduction method for sound event detection based on non-negative matrix factorization (NMF). In this paper, we proposed a deep learning model that integrates Convolution Neural Network (CNN) with Non-Negative Matrix Factorization (NMF). To improve the separation quality of the NMF, it includes noise update technique that learns and adapts the characteristics of the current noise in real time. The noise update technique analyzes the sparsity and activity of the noise bias at the present time and decides the update training based on the noise candidate group obtained every frame in the previous noise reduction stage. Noise bias ranks selected as candidates for update training are updated in real time with discrimination NMF training. This NMF was applied to CNN and Hidden Markov Model(HMM) to achieve improvement for performance of sound event detection. Since CNN has a more obvious performance improvement effect, it can be widely used in sound source based CNN algorithm.

Keywords

Non-negative matrix; CNN; artificial neural networks; Sound Event Detection; Signal to Noise Ratio;

Citations & Related Records

Reference

1	Z. Md. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue, and K. Mizutani, "State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow's Intelligent Network Traffic Control Systems," IEEE Commun. Surveys Tutorials, vol. 19, no. 4, pp. 2432-2455, 2017. DOI: 10.1109/COMST.2017.2707140 DOI
2	Z. Liu, Z. Jia, C. Vong, S. Bu, J. Han, and X. Tang, "Capturing High-Discriminative Fault Features for Electronics-Rich Analog System via Deep Learning," IEEE Trans. Indust. Inform., vol. 13, no. 3, pp. 1213-1226, Jun. 2017. DOI: 10.1109/TII.2017.2690940 DOI
3	M. He and D. He, "Deep Learning Based Approach for Bearing Fault Diagnosis," IEEE Trans. Indust. Applications, vol. 53, no. 3, pp. 3057-3065, Jun. 2017. DOI: 10.1109/TIA.2017.2661250 DOI
4	T. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, "PCANet: A Simple Deep Learning Baseline for Image Classification?," IEEE Trans. Image Process., vol. 24, no. 12, pp. 5017-5032, Dec. 2015. DOI: 10.1109/TIP.2015.2475625 DOI
5	A. Mesaros, T. Heittola, E. Benetos, P. Foster, M. Lagrange, T. Virtanen, and M. D. Plumbley, "Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge," IEEE/ACM Trans Audio, Speech, an Language Process., vol. 26, no. 2, pp. 379-393, Feb. 2018. DOI: 10.1109/TASLP.2017.2778423 DOI
6	Q. Kong, Y. Cao, T. Iqbal, Yong Xu, W. Wang, and M. D. Plumbley, "Cross-task learning for audio-tagging, sound event detection spatial localization: DCASE 2019 baseline systems," arXiv: 1904.03476, pp. 1-5.
7	D. D. Lee, and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, vol. 401, no. 6755, pp. 788-791, Oct. 1999. DOI
8	Y. Xie, Z. Liu, Z. Yao, and B. Dai, "Improved two-stage Wiener filter for robust speaker identification," in Proceedings of the 18th International Conference on Pattern Recognition (ICPR '06), pp. 310-313, Hong Kong, August 2006. DOI: 10.1109/ICPR.2006.696
9	E. Cakir, G. Parascandolo, T. Heittola, H. Huttunen, and T. Virtanen, "Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection," IEEE/ACM Trans Audio, Speech, an Language Process., vol. 25, no. 6, pp. 1291-1303, Jun. 2017. DOI: 10.1109/TASLP.2017.2690575 DOI
10	T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. L. Roux, and K. Takeda, "Duration-Controlled LSTM for Polyphonic Sound Event Detection," IEEE/ACM Trans Audio, Speech, an Language Process., vol. 25, no. 11, pp. 2059-2070, Nov. 2017. DOI: 10.1109/TASLP.2017.2740002 DOI
11	Crocco, M.; Cristani, M.; Trucco, A.; Murino, V. Audio surveillance: A systematic review. ACM Comput. Surv. 2016, 48, 52. DOI: 10.1145/2871183 DOI
12	Sharan, R.V.; Moir, T.J. An overview of applications and advancements in automatic sound recognition. Neurocomputing 2016, 200, 22-34. doi.org/10.1016/j.neucom.2016.03.020 DOI
13	J. Lu, "Mean Teacher Convolution System For DCASE 2018 Task 4," Detection and Classification of Acoustics Scenes and Events 2018, Shanghai, China, Jul. 2018, pp. 1-5.
14	Cakir, E.; Parascandolo, G.; Heittola, T.; Huttunen, H.; Virtanen, T. Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 1291-1303. DOI: 10.1109/TASLP.2017.2690575 DOI
15	B. McFee, J. Salamon, and J. P. Bello, "Adaptive Pooling Operators for Weakly Labeled Sound Event Detection," IEEE/ACM Trans Audio, Speech, an Language Process., vol. 26, no. 11, pp. 2180-2193, Apr. 2018. DOI
16	S. Adavanne, P. Pertila, and T. Virtanen, "Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network," Detection and Classification of Acoustics Scenes and Events 2017, Munich, Germany, Nov. 2017, pp. 1-5. DOI: 10.1109/ICASSP.2017.7952260
17	T. Komatsu, Y. Senda, and R. Kondo, "Acoustics Event Detection Based on Non-Negative Matrix Factorization With Mixtures of Local Dictionaries and Activation Aggregation," 2016 IEEE Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), Shanghai, China, Mar. 2016, pp. 2259-2263. DOI: 10.1109/ICASSP.2016.7472079
18	D. Su, X. Wu, L. Xu, "GMM-HMM acoustic model training by a two level procedure with Gaussian components determined by automatic model selection," 2010 IEEE Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), Dallas, TX, USA, Mar. 2010, pp. 4890-4893. DOI: 10.1109/ICASSP.2010.5495122
19	A. Mesaros, T. Heittola, A. Eronen, and T. Virtanen "Acoustic Event Detection in Real Life," 18th European Signal Process. Conf., Aalborg, Denmark, Aug. 2010, pp. 1267-1271.
20	V. Bisot, S. Essid, and G. Richard, "Overlapping Sound Event Detection with Supervised Nonnegative Matrix Factorization," 2017 IEEE Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), New Orleans, LA, USA, Mar. 2017, pp. 31-35. DOI: 10.1109/ICASSP.2017.7951792