DOI QR코드

DOI QR Code

A Real-Time Sound Recognition System with a Decision Logic of Random Forest for Robots

Random Forest를 결정로직으로 활용한 로봇의 실시간 음향인식 시스템 개발

  • Received : 2022.05.27
  • Accepted : 2022.07.06
  • Published : 2022.08.31

Abstract

In this paper, we propose a robot sound recognition system that detects various sound events. The proposed system is designed to detect various sound events in real-time by using a microphone on a robot. To get real-time performance, we use a VGG11 model which includes several convolutional neural networks with real-time normalization scheme. The VGG11 model is trained on augmented DB through 24 kinds of various environments (12 reverberation times and 2 signal to noise ratios). Additionally, based on random forest algorithm, a decision logic is also designed to generate event signals for robot applications. This logic can be used for specific classes of acoustic events with better performance than just using outputs of network model. With some experimental results, the performance of proposed sound recognition system is shown on real-time device for robots.

Keywords

Acknowledgement

This project was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (No. 2020-0-00857)

References

  1. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," Computer Vision and Pattern Recognition, 2015, DOI: 10.48550/arXiv.1409.1556.
  2. S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. J. Weiss, and K. Wilson, "CNN architectures for large-scale audio classification," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 2017, DOI: 10.1109/ICASSP.2017.7952132.
  3. S. Suh, S. Park, Y. Jeong, and T. Lee, "Designing Acoustic Scene Classification Models with CNN Variants," DCASE 2020 Challenge, 2020, [Online], https://dcase.community/documents/challenge2020/technical_reports/DCASE2020_Suh_101.pdf
  4. H. Seo, J. Park, and Y. Park, "Acoustic scene classification using various pre-processed features and convolutional neural networks," DCASE 2019 Challenge, 2019, [Online], https://dcase.community/documents/challenge2019/technical_reports/DCASE2019_Seo_72.pdf
  5. T. K. Ho, "Random decision forests," 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 1995, DOI: 10.1109/ICDAR.1995.598994.
  6. C.-Y. Yu, H. Liu, and Z.-M. Qi, "Sound Event Detection Using Deep Random Forest," DCASE 2017 Challenge, 2017, [Online], https://dcase.community/documents/challenge2017/technical_reports/DCASE2017_Yu_162.pdf
  7. I. McLoughlin, H. Zhang, Z. Xie, Y. Song, and W. Xiao, "Robust sound event classification using deep neural networks," IEEE/ACM Transactions On Audio, Speech, And Language Processing, vol. 3, no. 3, March, 2015, DOI: 10.1109/TASLP.2015.2389618.
  8. I. Ozer, Z. Ozer, and O. Findik, "Noise robust sound event classification with convolutional neural network," Neurocomputing, vol. 272, no. 10, pp. 505-512, Jan., 2018, DOI: 10.1016/j.neucom.2017.07.021.
  9. K. Wang, J. Zhang, S. Sun, Y. Wang, F. Xiang, and L. Xie, "Investigating generative adversarial networks based speech dereverberation for robust speech recognition," Interspeech 2018, 2018, DOI: 10.21437/Interspeech.2018-1780.
  10. J. Lee, D. Lee, H.-S. Choi, and K. Lee, "Room adaptive conditioning method for sound event classification in reverberant environments," 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, DOI: 10.1109/ICASSP39728.2021.9413929.
  11. NVIDIA, "Jetson AGX Xavier Developer Kit," [Online], https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit, Accessed; May 27, 2022.
  12. NVIDIA, "TensorRT," [Online], https://developer.nvidia.com/tensorrt, Accessed: May 27, 2019.
  13. K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, DOI: 10.1109/ICCV.2015.123.
  14. P. Harar, R. Bammer, A. Breger, M. Dorfler, and Z. Smekal, "Improving Machine Hearing on Limited Data Sets," 2019 11th International Congress On Ultra Modern Telecommunications And Control Systems And Workshops (ICUMT), Dublin, Ireland, 2019, DOI: 10.1109/ICUMT48472.2019.8970740.
  15. D. Morawiec, "sklearn-porter," [Online], https://github.com/nok/sklearn-porter, Accessed: May 27, 2022.
  16. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," 3rd International Conference on Learning Representations, 2015, DOI: 10.48550/arXiv.1412.6980.
  17. J. B. Allen and D. A. Berkley, "Image method for efficiently simulating small-room acoustics," Journal Acoustic Society of America, vol. 65, no. 4, 1979, DOI: 10.1121/1.382599.