A Real-Time Sound Recognition System with a Decision Logic of Random Forest for Robots

Song, Ju-man;Kim, Changmin;Kim, Minook;Park, Yongjin;Lee, Seoyoung;Son, Jungkwan;

doi:10.7746/jkros.2022.17.3.273

로봇학회논문지 (The Journal of Korea Robotics Society)

제17권3호
/
Pages.273-281
/
2022
/
1975-6291(pISSN)
/
2287-3961(eISSN)

한국로봇학회 (Korea Robotics Society)

DOI QR Code

Random Forest를 결정로직으로 활용한 로봇의 실시간 음향인식 시스템 개발

A Real-Time Sound Recognition System with a Decision Logic of Random Forest for Robots

Song, Ju-man (LG Electronics) ;
Kim, Changmin (LG Electronics) ;
Kim, Minook (LG Electronics) ;
Park, Yongjin (LG Electronics) ;
Lee, Seoyoung (LG Electronics) ;
Son, Jungkwan (LG Electronics)

투고 : 2022.05.27
심사 : 2022.07.06
발행 : 2022.08.31

https://doi.org/10.7746/jkros.2022.17.3.273 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

In this paper, we propose a robot sound recognition system that detects various sound events. The proposed system is designed to detect various sound events in real-time by using a microphone on a robot. To get real-time performance, we use a VGG11 model which includes several convolutional neural networks with real-time normalization scheme. The VGG11 model is trained on augmented DB through 24 kinds of various environments (12 reverberation times and 2 signal to noise ratios). Additionally, based on random forest algorithm, a decision logic is also designed to generate event signals for robot applications. This logic can be used for specific classes of acoustic events with better performance than just using outputs of network model. With some experimental results, the performance of proposed sound recognition system is shown on real-time device for robots.

키워드

과제정보

This project was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (No. 2020-0-00857)

참고문헌

K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," Computer Vision and Pattern Recognition, 2015, DOI: 10.48550/arXiv.1409.1556.
S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. J. Weiss, and K. Wilson, "CNN architectures for large-scale audio classification," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 2017, DOI: 10.1109/ICASSP.2017.7952132.
S. Suh, S. Park, Y. Jeong, and T. Lee, "Designing Acoustic Scene Classification Models with CNN Variants," DCASE 2020 Challenge, 2020, [Online], https://dcase.community/documents/challenge2020/technical_reports/DCASE2020_Suh_101.pdf
H. Seo, J. Park, and Y. Park, "Acoustic scene classification using various pre-processed features and convolutional neural networks," DCASE 2019 Challenge, 2019, [Online], https://dcase.community/documents/challenge2019/technical_reports/DCASE2019_Seo_72.pdf
T. K. Ho, "Random decision forests," 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 1995, DOI: 10.1109/ICDAR.1995.598994.
C.-Y. Yu, H. Liu, and Z.-M. Qi, "Sound Event Detection Using Deep Random Forest," DCASE 2017 Challenge, 2017, [Online], https://dcase.community/documents/challenge2017/technical_reports/DCASE2017_Yu_162.pdf
I. McLoughlin, H. Zhang, Z. Xie, Y. Song, and W. Xiao, "Robust sound event classification using deep neural networks," IEEE/ACM Transactions On Audio, Speech, And Language Processing, vol. 3, no. 3, March, 2015, DOI: 10.1109/TASLP.2015.2389618.
I. Ozer, Z. Ozer, and O. Findik, "Noise robust sound event classification with convolutional neural network," Neurocomputing, vol. 272, no. 10, pp. 505-512, Jan., 2018, DOI: 10.1016/j.neucom.2017.07.021.
K. Wang, J. Zhang, S. Sun, Y. Wang, F. Xiang, and L. Xie, "Investigating generative adversarial networks based speech dereverberation for robust speech recognition," Interspeech 2018, 2018, DOI: 10.21437/Interspeech.2018-1780.
J. Lee, D. Lee, H.-S. Choi, and K. Lee, "Room adaptive conditioning method for sound event classification in reverberant environments," 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, DOI: 10.1109/ICASSP39728.2021.9413929.
NVIDIA, "Jetson AGX Xavier Developer Kit," [Online], https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit, Accessed; May 27, 2022.
NVIDIA, "TensorRT," [Online], https://developer.nvidia.com/tensorrt, Accessed: May 27, 2019.
K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, DOI: 10.1109/ICCV.2015.123.
P. Harar, R. Bammer, A. Breger, M. Dorfler, and Z. Smekal, "Improving Machine Hearing on Limited Data Sets," 2019 11th International Congress On Ultra Modern Telecommunications And Control Systems And Workshops (ICUMT), Dublin, Ireland, 2019, DOI: 10.1109/ICUMT48472.2019.8970740.
D. Morawiec, "sklearn-porter," [Online], https://github.com/nok/sklearn-porter, Accessed: May 27, 2022.
D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," 3rd International Conference on Learning Representations, 2015, DOI: 10.48550/arXiv.1412.6980.
J. B. Allen and D. A. Berkley, "Image method for efficiently simulating small-room acoustics," Journal Acoustic Society of America, vol. 65, no. 4, 1979, DOI: 10.1121/1.382599.

로봇학회논문지 (The Journal of Korea Robotics Society)

Random Forest를 결정로직으로 활용한 로봇의 실시간 음향인식 시스템 개발

A Real-Time Sound Recognition System with a Decision Logic of Random Forest for Robots

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)