Browse > Article
http://dx.doi.org/10.7746/jkros.2022.17.3.273

A Real-Time Sound Recognition System with a Decision Logic of Random Forest for Robots  

Song, Ju-man (LG Electronics)
Kim, Changmin (LG Electronics)
Kim, Minook (LG Electronics)
Park, Yongjin (LG Electronics)
Lee, Seoyoung (LG Electronics)
Son, Jungkwan (LG Electronics)
Publication Information
The Journal of Korea Robotics Society / v.17, no.3, 2022 , pp. 273-281 More about this Journal
Abstract
In this paper, we propose a robot sound recognition system that detects various sound events. The proposed system is designed to detect various sound events in real-time by using a microphone on a robot. To get real-time performance, we use a VGG11 model which includes several convolutional neural networks with real-time normalization scheme. The VGG11 model is trained on augmented DB through 24 kinds of various environments (12 reverberation times and 2 signal to noise ratios). Additionally, based on random forest algorithm, a decision logic is also designed to generate event signals for robot applications. This logic can be used for specific classes of acoustic events with better performance than just using outputs of network model. With some experimental results, the performance of proposed sound recognition system is shown on real-time device for robots.
Keywords
Sound Event Detection; Deep Learning; Robot Implementation; Audio Signal Processing; Machine Learning; Real-Time Implementation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 K. Wang, J. Zhang, S. Sun, Y. Wang, F. Xiang, and L. Xie, "Investigating generative adversarial networks based speech dereverberation for robust speech recognition," Interspeech 2018, 2018, DOI: 10.21437/Interspeech.2018-1780.   DOI
2 S. Suh, S. Park, Y. Jeong, and T. Lee, "Designing Acoustic Scene Classification Models with CNN Variants," DCASE 2020 Challenge, 2020, [Online], https://dcase.community/documents/challenge2020/technical_reports/DCASE2020_Suh_101.pdf
3 I. McLoughlin, H. Zhang, Z. Xie, Y. Song, and W. Xiao, "Robust sound event classification using deep neural networks," IEEE/ACM Transactions On Audio, Speech, And Language Processing, vol. 3, no. 3, March, 2015, DOI: 10.1109/TASLP.2015.2389618.   DOI
4 I. Ozer, Z. Ozer, and O. Findik, "Noise robust sound event classification with convolutional neural network," Neurocomputing, vol. 272, no. 10, pp. 505-512, Jan., 2018, DOI: 10.1016/j.neucom.2017.07.021.   DOI
5 H. Seo, J. Park, and Y. Park, "Acoustic scene classification using various pre-processed features and convolutional neural networks," DCASE 2019 Challenge, 2019, [Online], https://dcase.community/documents/challenge2019/technical_reports/DCASE2019_Seo_72.pdf
6 T. K. Ho, "Random decision forests," 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 1995, DOI: 10.1109/ICDAR.1995.598994.   DOI
7 J. Lee, D. Lee, H.-S. Choi, and K. Lee, "Room adaptive conditioning method for sound event classification in reverberant environments," 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, DOI: 10.1109/ICASSP39728.2021.9413929.   DOI
8 K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, DOI: 10.1109/ICCV.2015.123.   DOI
9 P. Harar, R. Bammer, A. Breger, M. Dorfler, and Z. Smekal, "Improving Machine Hearing on Limited Data Sets," 2019 11th International Congress On Ultra Modern Telecommunications And Control Systems And Workshops (ICUMT), Dublin, Ireland, 2019, DOI: 10.1109/ICUMT48472.2019.8970740.   DOI
10 K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," Computer Vision and Pattern Recognition, 2015, DOI: 10.48550/arXiv.1409.1556.   DOI
11 S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. J. Weiss, and K. Wilson, "CNN architectures for large-scale audio classification," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 2017, DOI: 10.1109/ICASSP.2017.7952132.   DOI
12 J. B. Allen and D. A. Berkley, "Image method for efficiently simulating small-room acoustics," Journal Acoustic Society of America, vol. 65, no. 4, 1979, DOI: 10.1121/1.382599.   DOI
13 C.-Y. Yu, H. Liu, and Z.-M. Qi, "Sound Event Detection Using Deep Random Forest," DCASE 2017 Challenge, 2017, [Online], https://dcase.community/documents/challenge2017/technical_reports/DCASE2017_Yu_162.pdf
14 NVIDIA, "TensorRT," [Online], https://developer.nvidia.com/tensorrt, Accessed: May 27, 2019.
15 D. Morawiec, "sklearn-porter," [Online], https://github.com/nok/sklearn-porter, Accessed: May 27, 2022.
16 D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," 3rd International Conference on Learning Representations, 2015, DOI: 10.48550/arXiv.1412.6980.   DOI
17 NVIDIA, "Jetson AGX Xavier Developer Kit," [Online], https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit, Accessed; May 27, 2022.