Deep Learning-Based Sound Localization Using Stereo Signals Based on Synchronized ILD

Hwang, Hyeon Tae;Yun, Deokgyu;Choi, Seung Ho;

doi:10.7236/IJIBC.2019.11.3.106

International Journal of Internet, Broadcasting and Communication

제11권3호
/
Pages.106-110
/
2019
/
2288-4920(pISSN)
/
2288-4939(eISSN)

한국인터넷방송통신학회 (The Institute of Internet, Broadcasting and Communication)

DOI QR Code

Deep Learning-Based Sound Localization Using Stereo Signals Based on Synchronized ILD

Hwang, Hyeon Tae (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology) ;
Yun, Deokgyu (Dept. of Electronic Engineering, Seoul National University of Science and Technology) ;
Choi, Seung Ho (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology)

투고 : 2019.07.24
심사 : 2019.08.05
발행 : 2019.08.31

https://doi.org/10.7236/IJIBC.2019.11.3.106 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

The interaural level difference (ILD) used for the sound localization using stereo signals is to find the difference in energy that the sound source reaches both ears. The conventional ILD does not consider the time difference of the stereo signals, which is a factor of lowering the accuracy. In this paper, we propose a synchronized ILD that obtains the ILD after synchronizing these time differences. This method uses the cross-correlation function (CCF) to calculate the time difference to reach both ears and use it to obtain synchronized ILD. In order to prove the performance of the proposed method, we conducted two sound localization experiments. In each experiment, the synchronized ILD and CCF or only the synchronized ILD were given as inputs of the deep neural networks (DNN), respectively. In this paper, we evaluate the performance of sound localization with mean error and accuracy of sound localization. Experimental results show that the proposed method has better performance than the conventional methods.

키워드

참고문헌

Geoffrey Hinton, et al., "Deep neural networks for acoustic modeling in speech recognition," IEEE Signal Processing Magazine 29, Nov. 2012. DOI: https://doi.org/10.1109/MSP.2012.2205597
Alex Graves, Abdel-rahman Mohamed and Geoffrey Hinton, "Speech recognition with deep recurrent neural networks," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013. DOI: https://arxiv.org/abs/1303.5778
Amir Avni and Boaz Rafaely, "Sound localization in a sound field represented by spherical harmonics," International Symposium on Ambisonics and Spherical Acoustics, May 2010.
William M. Hartmann and Zachary A. Constan, "Interaural level differences and the level-meter model," The Journal of the Acoustical Society of America 112.3: 1037-1045, 2002. DOI: https://doi.org/10.1121/1.1500759.
S. T. Birchfield and R. Gangishetty, "Acoustic localization by interaural level difference," International Conference on Acoustics, Speech, and Signal Processing, 2005. DOI: https://doi.org/10.1109/ICASSP.2005.1416207.
Christophe Veaux, Junichi Yamagishi, and Kirsten MacDonald, CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit, [sound]. University of Edinburgh. The Centre for Speech Technology Research (CSTR), 2017. DOI: https://doi.org/10.7488/ds/1994.
T. Qu, Z. Xiao, M. Gong, Y. Huang, X. Li, and X. Wu, "Distance dependent head-related transfer functions measured with high spatial resolution using a spark gap," IEEE Trans. on Audio, Speech and Language Processing, vol. 17, no. 6, pp. 1124-1132, 2009. DOI: https://doi.org/10.1109/TASL.2009.2020532.
Ning Ma, Tobias May, and Guy J. Brown, "Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments," IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 25.12: 2444-2453, 2017. DOI: https://doi.org/10.1109/TASLP.2017.2750760.
Frederic L. Wightman and Doris J. Kistler, "Resolution of front-back ambiguity in spatial hearing by listener and source movement," The Journal of the Acoustical Society of America 105.5: 2841-2853, 1999. DOI: https://doi.org/10.1121/1.426899.
Tobias May, Steven van de Par, and Armin Kohlrausch, "A probabilistic model for robust localization based on a binaural auditory front-end," IEEE Trans. Audio, Speech, and Lang. Process., vol. 19, no. 1, pp. 1-13, Jan. 2011. DOI: https://doi.org/10.1109/TASL.2010.2042128.

International Journal of Internet, Broadcasting and Communication

Deep Learning-Based Sound Localization Using Stereo Signals Based on Synchronized ILD

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)