DOI QR코드

DOI QR Code

Deep Learning-Based Sound Localization Using Stereo Signals Based on Synchronized ILD

  • Hwang, Hyeon Tae (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology) ;
  • Yun, Deokgyu (Dept. of Electronic Engineering, Seoul National University of Science and Technology) ;
  • Choi, Seung Ho (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology)
  • 투고 : 2019.07.24
  • 심사 : 2019.08.05
  • 발행 : 2019.08.31

초록

The interaural level difference (ILD) used for the sound localization using stereo signals is to find the difference in energy that the sound source reaches both ears. The conventional ILD does not consider the time difference of the stereo signals, which is a factor of lowering the accuracy. In this paper, we propose a synchronized ILD that obtains the ILD after synchronizing these time differences. This method uses the cross-correlation function (CCF) to calculate the time difference to reach both ears and use it to obtain synchronized ILD. In order to prove the performance of the proposed method, we conducted two sound localization experiments. In each experiment, the synchronized ILD and CCF or only the synchronized ILD were given as inputs of the deep neural networks (DNN), respectively. In this paper, we evaluate the performance of sound localization with mean error and accuracy of sound localization. Experimental results show that the proposed method has better performance than the conventional methods.

키워드

참고문헌

  1. Geoffrey Hinton, et al., "Deep neural networks for acoustic modeling in speech recognition," IEEE Signal Processing Magazine 29, Nov. 2012. DOI: https://doi.org/10.1109/MSP.2012.2205597
  2. Alex Graves, Abdel-rahman Mohamed and Geoffrey Hinton, "Speech recognition with deep recurrent neural networks," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013. DOI: https://arxiv.org/abs/1303.5778
  3. Amir Avni and Boaz Rafaely, "Sound localization in a sound field represented by spherical harmonics," International Symposium on Ambisonics and Spherical Acoustics, May 2010.
  4. William M. Hartmann and Zachary A. Constan, "Interaural level differences and the level-meter model," The Journal of the Acoustical Society of America 112.3: 1037-1045, 2002. DOI: https://doi.org/10.1121/1.1500759.
  5. S. T. Birchfield and R. Gangishetty, "Acoustic localization by interaural level difference," International Conference on Acoustics, Speech, and Signal Processing, 2005. DOI: https://doi.org/10.1109/ICASSP.2005.1416207.
  6. Christophe Veaux, Junichi Yamagishi, and Kirsten MacDonald, CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit, [sound]. University of Edinburgh. The Centre for Speech Technology Research (CSTR), 2017. DOI: https://doi.org/10.7488/ds/1994.
  7. T. Qu, Z. Xiao, M. Gong, Y. Huang, X. Li, and X. Wu, "Distance dependent head-related transfer functions measured with high spatial resolution using a spark gap," IEEE Trans. on Audio, Speech and Language Processing, vol. 17, no. 6, pp. 1124-1132, 2009. DOI: https://doi.org/10.1109/TASL.2009.2020532.
  8. Ning Ma, Tobias May, and Guy J. Brown, "Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments," IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 25.12: 2444-2453, 2017. DOI: https://doi.org/10.1109/TASLP.2017.2750760.
  9. Frederic L. Wightman and Doris J. Kistler, "Resolution of front-back ambiguity in spatial hearing by listener and source movement," The Journal of the Acoustical Society of America 105.5: 2841-2853, 1999. DOI: https://doi.org/10.1121/1.426899.
  10. Tobias May, Steven van de Par, and Armin Kohlrausch, "A probabilistic model for robust localization based on a binaural auditory front-end," IEEE Trans. Audio, Speech, and Lang. Process., vol. 19, no. 1, pp. 1-13, Jan. 2011. DOI: https://doi.org/10.1109/TASL.2010.2042128.