DOI QR코드

DOI QR Code

Acoustic Event Detection in Multichannel Audio Using Gated Recurrent Neural Networks with High-Resolution Spectral Features

  • Kim, Hyoung-Gook (Department of Electronics Convergence Engineering, Kwangwoon University) ;
  • Kim, Jin Young (Department of Electronics and Computer Engineering, Chonnam National University)
  • Received : 2017.03.02
  • Accepted : 2017.06.07
  • Published : 2017.12.01

Abstract

Recently, deep recurrent neural networks have achieved great success in various machine learning tasks, and have also been applied for sound event detection. The detection of temporally overlapping sound events in realistic environments is much more challenging than in monophonic detection problems. In this paper, we present an approach to improve the accuracy of polyphonic sound event detection in multichannel audio based on gated recurrent neural networks in combination with auditory spectral features. In the proposed method, human hearing perception-based spatial and spectral-domain noise-reduced harmonic features are extracted from multichannel audio and used as high-resolution spectral inputs to train gated recurrent neural networks. This provides a fast and stable convergence rate compared to long short-term memory recurrent neural networks. Our evaluation reveals that the proposed method outperforms the conventional approaches.

Keywords

References

  1. D. Zhang and D. Ellis, "Detecting Sound Events in Basketball Video Archive," Dept. Electronic Eng., Columbia Univ., New York, USA, Speech & Audio Processing Class Project Rport, 2001.
  2. T. Heittola et al., "Audio Context Recognition Using Audio Event Histograms," Eur. Signal Process. Conf., Aalborg, Denmark, Aug. 23-27, 2010, pp. 1272-1276.
  3. Y. Peng et al., "Healthcare Audio Event Classification Using Hidden Markov Models and Hierarchical Hidden Markov Models," IEEE Int. Conf. Multimedia Expo, New York, USA, June 28-July 3, 2009, pp. 1218-1221.
  4. A. Harma, M.F. McKinney, and J. Skowronek, "Automatic Surveillance of the Acoustic Activity in our Living Environment," IEEE Int. Conf. Multimedia Expo, Amsterdam, Netherlands, July 6-8, 2005, pp. 634-637.
  5. T. Heittola et al., "Context-Dependent Sound Event Detection," EURASIP J. Audio Speech Music Process., vol. 2013, no. 1, Feb. 2013, pp. 1-13. https://doi.org/10.1186/1687-4722-2013-1
  6. E. Cakir et al., "Polyphonic Sound Event Detection Using Multi Label Deep Neural Networks," Int. Joint Conf. Neural Netw., Killarney, Ireland, July 12-17, 2015, pp. 1-7.
  7. E. Cakir et al., "Multi-label vs. Combined Single-Label Sound Event Detection with Deep Neural Networks," Eur. Signal Process. Conf. (EUSIPCO), Nice, France, Aug. 31-Sept. 4, 2015, pp. 2551-2555.
  8. H. Zhang, I. McLoughlin, and Y. Song, "Robust Sound Event Recognition Using Convolutional Neural Networks," IEEE Int. Conf. Acoust., Speech Signal Process., Brisbane, Australia, Apr. 19-24, 2015, pp. 559-563.
  9. A. Graves, A. Mohamed, and G. Hinton, "Speech Recognition with Deep Recurrent Neural Networks," IEEE Int. Conf. Acoust., Speech Signal Process., Vancouver, Canada, May 26-31, 2013, pp. 6645-6649.
  10. G. Parascandolo, H. Huttunen, and T. Virtanen, "Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings," IEEE Int. Conf. Acoust., Speech Signal Process., Shanghai, China,Mar. 20-25, 2016, pp. 6440-6444.
  11. J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Gated Feedback Recurrent Neural Networks," Int. Conf. Mach. Learn., Lille, France, July 6-11, 2015, pp. 2067-2075.
  12. M. Zoehrer and F. Pernkopf, "Gated Recurrent Networks Applied to Acoustic Scene Classification and Acoustic Event Detection," Detection Classification Acoust. Scenes Events, Budapest, Hungary, Sept. 3, 2016, pp. 1-5.
  13. S. Adavanne et al., "Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features," Detection Classification Acoust. Scenes Events, Budapest, Hungary, Sept. 3, 2016, pp. 1-5.
  14. A. Mesaros, T. Heittola, and T. Virtanen, "TUT Database for Acoustic Scene Classification and Sound Event Detection," Eur. Signal Process. Conf., Budapest, Hungary, Aug. 29-Sept. 2, 2016, pp. 1128-1132.
  15. C. Knapp and G. Carter, "The Generalized Correlation Method for Estimation of Time Delay," IEEE Trans. Acoust. Speech Sig. Process., vol. 24, no. 4, Aug. 1976, pp. 320-327. https://doi.org/10.1109/TASSP.1976.1162830
  16. B. Uzkent, B.D. Barkana, and H. Cevikalp, "Non-speech Environmental Sound Classification Using SVMs with a New Set of Features," Int. J. Innov. Comput. I., vol. 8, no. 5(B), May 2012, pp. 3511-3524.
  17. P.N. Garner, M. Cernak, and P. Motlicek, "A Simple Continuous Pitch Estimation Algorithm," IEEE Sig. Process. Lett., vol. 20, no. 1, Jan. 2013, pp. 102-105. https://doi.org/10.1109/LSP.2012.2231675
  18. J.A. Morales-Cordovilla et al., "A Pitch Based Noise Estimation Technique for Robust Speech," IEEE Int. Conf. Acoust., Speech Sig. Process., Prague, Czech Republic, May 22-27, 2011, pp. 4808-4811.
  19. M. Espi, M. Fujimoto, and T. Nakatani, "Acoustic Event Detection in Speech Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning," IEICE Trans. Inform. Syst., vol. E98-D, no. 10, Oct. 2015, pp. 1799-1807. https://doi.org/10.1587/transinf.2014EDP7430
  20. J. Chung et al., "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling," CoRR., Accessed 2017. https://arxiv.org/abs/1412.3555
  21. G.-B. Zhou et al., "Minimal Gated Unit for Recurrent Neural Networks," Int. J. Autom. Comput., vol. 13, no. 3, June 2016, pp. 226-234. https://doi.org/10.1007/s11633-016-1006-2
  22. T. Schaul et al., "Pybrain," J. Mach. Learn. Res., vol. 11, Feb. 2010, pp. 743-746.
  23. A. Mesaros, T. Heittola, and T. Vitrtannen, "Metrics for Polyphonic Sound Event Detection," Appl. Sci., vol. 6, no. 6, May 2016, pp. 1-17.

Cited by

  1. Optical and Acoustic Sensor-Based 3D Ball Motion Estimation for Ball Sport Simulators † vol.18, pp.5, 2017, https://doi.org/10.3390/s18051323
  2. Improved Convolutional Neural Networks for Acoustic Event Classification vol.78, pp.12, 2017, https://doi.org/10.1007/s11042-018-6991-4
  3. A Survey: Neural Network-Based Deep Learning for Acoustic Event Detection vol.38, pp.8, 2017, https://doi.org/10.1007/s00034-019-01094-1