DOI QR코드

DOI QR Code

Generalized cross correlation with phase transform sound source localization combined with steered response power method

조정 응답 파워 방법과 결합된 generalized cross correlation with phase transform 음원 위치 추정

  • 김영준 (충북대학교 전파통신공학과) ;
  • 오민재 (충북대학교 전파통신공학과) ;
  • 이인성 (충북대학교 전파통신공학과)
  • Received : 2017.06.19
  • Accepted : 2017.09.28
  • Published : 2017.09.30

Abstract

We propose a methods which is reducing direction estimation error of sound source in the reverberant and noisy environments. The proposed algorithm divides speech signal into voice and unvoice using VAD. We estimate the direction of source when current frame is voiced. TDOA (Time-Difference of Arrival) between microphone array using the GCC-PHAT (Generalized Cross Correlation with Phase Transform) method will be estimated in that frame. Then, we compare the peak value of cross-correlation of two signals applied to estimated time-delay with other time-delay in time-table in order to improve the accuracy of source location. If the angle of current frame is far different from before and after frame in successive voiced frame, the angle of current frame is replaced with mean value of the estimated angle in before and after frames.

본 논문에서는 잔향과 잡음이 존재하는 실제 환경을 모델링하여 두 개의 마이크로폰을 이용한 음원 위치추정의 정확성을 향상시키는 방법을 제안하였다. 입력신호에 VAD(Voice Activity Detection)를 적용하여 묵음 구간을 제외한 음성 구간만을 사용하였고, 샘플링 주파수의 제한으로 인한 측정 범위를 벗어나는 프레임은 업샘플링(up-sampling)을 통해 지연시간을 다시 추정하였다. 여기서 계산된 도착 지연 시간은 Time-table을 참조해 주변 후보위치의 지연 값들과의 비교로 최대 파워 값을 갖는 지연 시간을 선택하여 음원 위치의 정확도를 높였다. 또한 프레임간의 상관성을 이용하여 연속된 음성 프레임의 경우 큰 추정 차가 발생하는 곳을 찾아 주변 프레임의 평균값으로 대체함으로써 음원의 위치 추정 성능을 향상시켰다.

Keywords

References

  1. Y. E. Kim and J. G. Chung, "The method of elevation accuracy in sound source localization system" (in Korean), IEEK 2, 24-29 (2009).
  2. K. Nakadai, T. Lourens, H. G. Okuno, and H. Kitano, "Active audition for humanoid," AAAI/IAAI, 832-839 (2000).
  3. B. Kwon, G. Kim, and Y. Park, "Considering microphone positions in sound source localization methods:in robot application," Robot & Human Interactive Communication, 7, 1050-1054 (2007).
  4. W. Kellemann, "A self-steering digital microphone array," Acoustics, Speech, and Signal Processing(ICASSP), 5, 3581-3584 (1991).
  5. J. Stachurski, L. Netsch and R. Cole, "Sound Source localization for video surveillance camera," Advanced Video and Signal Based Surveillance (AVSS), 93-98 (2013).
  6. M. Omologo and P. Svaizer, "Acoustic event localization using a crosspower-spectrum phase based technique," Acoustics, Speech, and Signal Processing(ICASSP), 2, 273-276 (1994).
  7. A. Johansson and S. Nordholm, "Robust acoustic direction of arrival estimation using Root-SRP-PHAT, a realtime implementation," Acoustics, Speech, and Signal Processing (ICASSP), 4, 933-936 (2005).
  8. J. Wang, Y. Zhao, and Z. Wang, "A MUSIC like DOA estimation method for signals with low SNR," Global Symposium on Millimeter-Waves(GSMM), 321-324 (2008).
  9. J. Baszun, "Passive sound source localization system," Zeszyty Naukowe Politechniki Bialostockiej. Informatyka, 5-16 (2011).
  10. M. S. Brandstein and H. F. Silverman, "A robust method for speech signal time-delay estimation in reverberant rooms," Acoustics, Speech, and Signal Processing (ICASSP), 1, 375-378 (1997).
  11. J. H. DiBiase, H. F. Silverman, and M. S. Brandstein, Robust Localization in Reverberant Rooms, Microphone Arrays (Springer, Berlin Heidelberg, 2001), chapter 8, pp. 157-180.
  12. Abad Sorbet. Marta, "Comparison of different methods for time delay estimation," UPNA 10-15 (2010).
  13. G. H. Lee, Y. J. Lee, and M. N. Kim, "Voice activity detection algorithm using wavelet band entropy ensemble analysis in car noisy environments" (in Korean), Journal of Korea Multimedia Society, 16, 1005-1017 (2013). https://doi.org/10.9717/kmms.2013.16.9.1005
  14. A. Bertrand and G. Bernardi, Audio processing: lab sessions, Session 1: Introduction to the acoustic simulation environment, 2015.