Browse > Article
http://dx.doi.org/10.5573/ieie.2017.54.2.123

Nonnegative Matrix Factorization Based Direction-of-Arrival Estimation of Multiple Sound Sources Using Dual Microphone Array  

Jeon, Kwang Myung (School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology)
Kim, Hong Kook (School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology)
Yu, Seung Woo (Service Laboratory, Institute of Convergence Technology, Korea Telecom)
Publication Information
Journal of the Institute of Electronics and Information Engineers / v.54, no.2, 2017 , pp. 123-129 More about this Journal
Abstract
This paper proposes a new nonnegative matrix factorization (NMF) based direction-of-arrival (DOA) estimation method for multiple sound sources using a dual microphone array. First of all, sound signals coming from the dual microphone array are segmented into consecutive analysis frames, and a steered-response power phase transform (SRP-PHAT) beamformer is applied to each frame so that stereo signals of each frame are represented in a time-direction domain. The time-direction outputs of SRP-PHAT are stored for a pre-defined number of frames, which is referred to as a time-direction block. Next, In order to estimate DOAs robust to noise, each time-direction block is normalized along the time by using a block subtraction technique. After that, an unsupervised NMF method is applied to the normalized time-direction block in order to cluster the directions of each sound source in a multiple sound source environments. In particular, the activation and basis matrices are used to estimate the number of sound sources and their DOAs, respectively. The DOA estimation performance of the proposed method is evaluated by measuring a mean absolute error (MAE) and the standard deviation of errors between the oracle and estimated DOAs under a three source condition, where the sources are located in [$-35{\circ}$, 5m], [$12{\circ}$, 4m], and [$38{\circ}$, 4.m] from the dual microphone array. It is shown from the experiment that the proposed method could relatively reduce MAE by 56.83%, compared to a conventional SRP-PHAT based DOA estimation method.
Keywords
Multiple DOAs; GCC-PHAT; SRP-PHAT; NMF;
Citations & Related Records
연도 인용수 순위
  • Reference
1 C. H. Knapp and G. C. Carter, "The generalized correlation method for estimation of time delay," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 24, No. 4, pp. 320-327, Aug. 1976.   DOI
2 D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and Techniques: Simon & Schuster, 1992.
3 H. Do, H. F. Silverman, and Y. Yu, "A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu, HI, pp. 121-124, Apr. 2007.
4 J. P. Dmochowski, J. Benesty, and S. Affes, "A generalized steered response power method for computationally viable source localization," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 15, No. 8, pp. 2510-2526, Nov. 2007.   DOI
5 H. Kayser, J. Anemuller, and K. Adiloglu, "Estimation of inter-channel phase differences using non-negative matrix factorization," in Proc. of IEEE 8th Sensor Array and Multi-channel Signal Processing Workshop (SAM), A Coruna, Spain, pp. 77-80, June 2014.
6 J. Traa, P. Smaragdis, N. D. Stein, and D. Wingate, "Directional NMF for joint source localization and separation," in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, pp. 1-5, Oct. 2015.
7 D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, Vol. 401, No. 6755, pp. 788-791, Nov. 1999.   DOI
8 J. Le Roux, F. J. Weninger, and J. R. Hershey, Sparse NMF-half-baked or Well Done?, Mitsubishi Electric Research Labs (MERL), Cambridge, MA, Technical Report TR-2015-23, Mar. 2015.
9 M. J. Kim, "Direction of arrival estimation in colored noise using wavelet decomposition," Journal of The Institute of Electronics and Information Engineers, Vol. 37, No. 11, pp. 48-59, Nov. 2000.
10 M. Brandstein and D. Ward, Microphone Arrays: Signal Processing Techniques and Applications, Springer Science & Business Media, 2001.
11 M. Vacher, B. Lecouteux, J. S. Romreo, M. Ajili, F. Portet, and S. Rossato, "Speech and speaker recognition for home automation: Preliminary results," in Proc. of International Conference on Speech Technology and Human-Computer Dialogue (SPeD), Bucharest, Romania, pp. 181-190, Oct. 2015.
12 R. J. Hyndman and A. B. Koehler, "Another look at measures of forecast accuracy," International Journal of Forecasting, Vol. 22, No. 4, pp. 679-688, May 2006.   DOI
13 A. H. Nuttall, "Some windows with very good sidelobe behavior," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 29, No. 1, pp. 84-91, Jan. 1981.   DOI
14 J. K. Nielsen, J. R. Jensen, S. H. Jensen, and M. G. Christensen, "The single-and multichannel audio recordings database (SMARD)," in Proc. of 14th International Workshop on Acoustic Signal Enhancement (IWAENC), Antibes, France, pp. 40-44, Sept. 2014.
15 P. Aarabi and G. Shi, "Phase-based dualmicrophone robust speech enhancement," IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, Vol. 34, No. 4, pp. 1763-1773, Aug. 2004.   DOI
16 D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization," in Proc. of Neural Information Processing System (NIPS), Denver, CO, pp. 556-562, Dec. 2000.
17 K. Ishiguro, T. Yamada, S. Araki, and T. Nakatani, "A probabilistic speaker clustering for DOA-based diarization," in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, pp. 241-244, Oct. 2009.
18 G. Valenzise, L. Gerosa, M. Tagliasacchi, F. Antonacci, and A. Sarti, "Scream and gunshot detection and localization for audio-surveillance systems," in Proc. of IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS), London, UK, pp. 21-26, Sept. 2007.