DOI QR코드

DOI QR Code

Nonnegative Matrix Factorization Based Direction-of-Arrival Estimation of Multiple Sound Sources Using Dual Microphone Array

이중 마이크로폰을 이용한 비음수 행렬분해 기반 다중음원 도래각 예측

  • Jeon, Kwang Myung (School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology) ;
  • Kim, Hong Kook (School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology) ;
  • Yu, Seung Woo (Service Laboratory, Institute of Convergence Technology, Korea Telecom)
  • 전광명 (광주과학기술원 전기전자컴퓨터공학부) ;
  • 김홍국 (광주과학기술원 전기전자컴퓨터공학부) ;
  • 유승우 (한국통신 융합기술원 서비스 연구소)
  • Received : 2016.10.12
  • Accepted : 2017.01.19
  • Published : 2017.02.25

Abstract

This paper proposes a new nonnegative matrix factorization (NMF) based direction-of-arrival (DOA) estimation method for multiple sound sources using a dual microphone array. First of all, sound signals coming from the dual microphone array are segmented into consecutive analysis frames, and a steered-response power phase transform (SRP-PHAT) beamformer is applied to each frame so that stereo signals of each frame are represented in a time-direction domain. The time-direction outputs of SRP-PHAT are stored for a pre-defined number of frames, which is referred to as a time-direction block. Next, In order to estimate DOAs robust to noise, each time-direction block is normalized along the time by using a block subtraction technique. After that, an unsupervised NMF method is applied to the normalized time-direction block in order to cluster the directions of each sound source in a multiple sound source environments. In particular, the activation and basis matrices are used to estimate the number of sound sources and their DOAs, respectively. The DOA estimation performance of the proposed method is evaluated by measuring a mean absolute error (MAE) and the standard deviation of errors between the oracle and estimated DOAs under a three source condition, where the sources are located in [$-35{\circ}$, 5m], [$12{\circ}$, 4m], and [$38{\circ}$, 4.m] from the dual microphone array. It is shown from the experiment that the proposed method could relatively reduce MAE by 56.83%, compared to a conventional SRP-PHAT based DOA estimation method.

본 논문에서는 이중 마이크로폰 배열을 이용하여 비음수 행렬분해(nonnegative matrix factorization, NMF) 기반으로 다중음원의 도래각을 추정하는 새로운 방법을 제안한다. 우선 이중 마이크로폰 배열에 들어온 음향 신호들을 연속된 분석프레임으로 분할한 후, 각 프레임에 대해 조향응답파워 위상변환(steered-response power phase transform, SRP-PHAT) 빔형성기를 적용하여 스테레오 신호들을 시간-방향 영역으로 표현한다. 이러한 SRP-PHAT의 시간-방향 출력값들은 사전에 정의된 프레임 수만큼 누적하여 시간-방향 블록으로 정의한다. 다음으로, 잡음에 강건한 도래각 추정을 위하여, 각 시간-방향 블록을 블록차감 기법을 사용하여 매 프레임에 대해 정규화한다. 이후, 다중음원 환경에서 각 음원의 방향을 클러스터링하기 위해 정규화된 시간-방향 블록에 비지도(unsupervised) NMF를 적용한다. 구체적으로, 음원의 개수와 이들의 도래각을 추정하는데 각각 활성 및 기저 행렬들을 사용한다. 제안된 방법의 도래각 추정 성능을 평가하기 위해 이중 마이크로폰 배열로부터 입력된 [$-35{\circ}$, 5m], [$12{\circ}$, 4m], 그리고 [$38{\circ}$, 4.m]에 각각 위치한 세 가지 음원들에 대한 추정 오차의 절대 평균(mean absolute error, MAE) 및 오차의 표준편차를 측정하였다. 실험 결과. 제안된 방법은 기존의 SRP-PHAT 기반 도래각 추정방법에 비해 상대적으로 MAE를 56.83% 줄일 수 있었다.

Keywords

References

  1. M. J. Kim, "Direction of arrival estimation in colored noise using wavelet decomposition," Journal of The Institute of Electronics and Information Engineers, Vol. 37, No. 11, pp. 48-59, Nov. 2000.
  2. M. Brandstein and D. Ward, Microphone Arrays: Signal Processing Techniques and Applications, Springer Science & Business Media, 2001.
  3. M. Vacher, B. Lecouteux, J. S. Romreo, M. Ajili, F. Portet, and S. Rossato, "Speech and speaker recognition for home automation: Preliminary results," in Proc. of International Conference on Speech Technology and Human-Computer Dialogue (SPeD), Bucharest, Romania, pp. 181-190, Oct. 2015.
  4. G. Valenzise, L. Gerosa, M. Tagliasacchi, F. Antonacci, and A. Sarti, "Scream and gunshot detection and localization for audio-surveillance systems," in Proc. of IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS), London, UK, pp. 21-26, Sept. 2007.
  5. K. Ishiguro, T. Yamada, S. Araki, and T. Nakatani, "A probabilistic speaker clustering for DOA-based diarization," in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, pp. 241-244, Oct. 2009.
  6. C. H. Knapp and G. C. Carter, "The generalized correlation method for estimation of time delay," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 24, No. 4, pp. 320-327, Aug. 1976. https://doi.org/10.1109/TASSP.1976.1162830
  7. D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and Techniques: Simon & Schuster, 1992.
  8. H. Do, H. F. Silverman, and Y. Yu, "A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu, HI, pp. 121-124, Apr. 2007.
  9. J. P. Dmochowski, J. Benesty, and S. Affes, "A generalized steered response power method for computationally viable source localization," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 15, No. 8, pp. 2510-2526, Nov. 2007. https://doi.org/10.1109/TASL.2007.906694
  10. H. Kayser, J. Anemuller, and K. Adiloglu, "Estimation of inter-channel phase differences using non-negative matrix factorization," in Proc. of IEEE 8th Sensor Array and Multi-channel Signal Processing Workshop (SAM), A Coruna, Spain, pp. 77-80, June 2014.
  11. J. Traa, P. Smaragdis, N. D. Stein, and D. Wingate, "Directional NMF for joint source localization and separation," in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, pp. 1-5, Oct. 2015.
  12. D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization," in Proc. of Neural Information Processing System (NIPS), Denver, CO, pp. 556-562, Dec. 2000.
  13. D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, Vol. 401, No. 6755, pp. 788-791, Nov. 1999. https://doi.org/10.1038/44565
  14. J. Le Roux, F. J. Weninger, and J. R. Hershey, Sparse NMF-half-baked or Well Done?, Mitsubishi Electric Research Labs (MERL), Cambridge, MA, Technical Report TR-2015-23, Mar. 2015.
  15. A. H. Nuttall, "Some windows with very good sidelobe behavior," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 29, No. 1, pp. 84-91, Jan. 1981. https://doi.org/10.1109/TASSP.1981.1163506
  16. J. K. Nielsen, J. R. Jensen, S. H. Jensen, and M. G. Christensen, "The single-and multichannel audio recordings database (SMARD)," in Proc. of 14th International Workshop on Acoustic Signal Enhancement (IWAENC), Antibes, France, pp. 40-44, Sept. 2014.
  17. P. Aarabi and G. Shi, "Phase-based dualmicrophone robust speech enhancement," IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, Vol. 34, No. 4, pp. 1763-1773, Aug. 2004. https://doi.org/10.1109/TSMCB.2004.830345
  18. R. J. Hyndman and A. B. Koehler, "Another look at measures of forecast accuracy," International Journal of Forecasting, Vol. 22, No. 4, pp. 679-688, May 2006. https://doi.org/10.1016/j.ijforecast.2006.03.001