Browse > Article
http://dx.doi.org/10.4218/etrij.14.0113.0917

Two-Microphone Binary Mask Speech Enhancement in Diffuse and Directional Noise Fields  

Abdipour, Roohollah (School of Computer Engineering, Iran University of Science & Technology)
Akbari, Ahmad (School of Computer Engineering, Iran University of Science & Technology)
Rahmani, Mohsen (Department of Computer Engineering Faculty of Engineering, Arak University)
Publication Information
ETRI Journal / v.36, no.5, 2014 , pp. 772-782 More about this Journal
Abstract
Two-microphone binary mask speech enhancement (2mBMSE) has been of particular interest in recent literature and has shown promising results. Current 2mBMSE systems rely on spatial cues of speech and noise sources. Although these cues are helpful for directional noise sources, they lose their efficiency in diffuse noise fields. We propose a new system that is effective in both directional and diffuse noise conditions. The system exploits two features. The first determines whether a given time-frequency (T-F) unit of the input spectrum is dominated by a diffuse or directional source. A diffuse signal is certainly a noise signal, but a directional signal could correspond to a noise or speech source. The second feature discriminates between T-F units dominated by speech or directional noise signals. Speech enhancement is performed using a binary mask, calculated based on the proposed features. In both directional and diffuse noise fields, the proposed system segregates speech T-F units with hit rates above 85%. It outperforms previous solutions in terms of signal-to-noise ratio and perceptual evaluation of speech quality improvement, especially in diffuse noise conditions.
Keywords
Two-microphone speech enhancement; source separation; binary mask; diffuse noise; directional noise;
Citations & Related Records
연도 인용수 순위
  • Reference
1 G. Kim and P.C. Loizou, "Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms," IEEE Trans. Audio Speech Language Proc., vol. 18, no. 8, Nov. 2010, pp. 2080-2090.   DOI
2 D.S. Brungart et al. "Isolating the Energetic Component of Speech-on-Speech Masking with Ideal Time-Frequency Segregation," J. Acoust. Soc. America, vol. 120, no. 6, 2006, pp. 4007-4018.   DOI
3 S. Harding, J. Barker, and G.J. Brown, "Mask Estimation for Missing Data Speech Recognition Based on Statistics of Binaural Interaction," IEEE Trans. Audio Speech Language Proc., vol. 14, no. 1, Jan. 2006, pp. 58-67.   DOI
4 G. Kim and P.C. Loizou, "Improving Speech Intelligibility in Noise Using a Binary Mask that is Based on Magnitude Spectrum Constraints," IEEE Signal Proc. Lett., vol. 17, no. 12, Dec. 2010, pp. 1010-1013.   DOI
5 N. Roman, D. Wang, and G.J. Brown, "A Classification-Based Cocktail Party Processor," Neural Inf. Proc. Syst., 2003, pp. 1425-1432.
6 M.L. Seltzer, B. Raj, and R.M. Stern, "A Bayesian Classifier for Spectrographic Mask Estimation for Missing Feature Speech Recognition," Speech Commun., vol. 43, no. 4, Sept. 2004, pp. 379-393.   DOI
7 B. Moore, An Introduction to the Psychology of Hearing, 5th ed., San Diego, CA, USA: Emerald Group Publishing Ltd, 2003, pp. 83-105.
8 D. Wang et al., "Speech Intelligibility in Background Noise with Ideal Binary Time-Frequency Masking," J. Acoust. Soc. America, vol. 125, no. 4, 2009, pp. 2336-2347.   DOI
9 S. Srinivasan, N. Roman, and D. Wang, "Binary and Ratio Time-Frequency Masks for Robust Speech Recognition," Speech Commun., vol. 48, no. 11, Nov. 2006, pp. 1486-1501.   DOI
10 Y. Hu and P.C. Loizou, "Techniques for Estimating the Ideal Binary Mask," Int. Workshop Acoust. Echo Noise Contr., Seattle, WA, USA, 2008.
11 Y. Hu and P.C. Loizou, "Environment-Specific Noise Suppression for Improved Speech Intelligibility by Cochlear Implant Users," J. Acoust. Soc. America, vol. 127, no. 6, 2010, pp. 3689-3695.   DOI
12 M.I. Mandel, R.J. Weiss, and D. Ellis, "Model-Based Expectation-Maximization Source Separation and Localization," IEEE Trans. Audio Speech Language Proc., vol. 18, no. 2, Feb. 2010, pp. 382-394.   DOI
13 J. Nix and V. Hohmann, "Sound Source Localization in Real Sound Fields Based on Empirical Statistics of Interaural Parameters," J. Acoust. Soc. America, vol. 119, no. 1, 2006, pp. 463-479.   DOI
14 T. Lotter, C. Benien, and P. Vary, "Multichannel Direction-Independent Speech Enhancement Using Spectral Amplitude Estimation," EURASIP J. Appl. Signal Proc., vol. 2003, no. 1, Jan. 2003, pp. 1147-1156.   DOI
15 E. Tessier and F. Berthommier, "Speech Enhancement and Segregation Based on the Localization Cue for Cocktail-Party Processing," CRAC Workshop, Alborg, Denmark, 2001.
16 R.J. Weiss, M.I. Mandel, and D.P. Ellis, "Combining Localization Cues and Source Model Constraints for Binaural Source Separation," Speech Commun., vol. 53, no. 5, 2011, pp. 606-621.   DOI
17 O. Yilmaz and S. Rickard, "Blind Separation of Speech Mixtures via Time-Frequency Masking," IEEE Trans. Signal Proc., vol. 52, no. 7, July 2004, pp. 1830-1847.   DOI   ScienceOn
18 H. Christensen et al., "Integrating Pitch and Localization Cues at a Speech Fragment Level," INTERSPEECH, Antwerp, Belgium, Aug. 27-31, 2007.
19 S. Rennie et al., "Robust Variational Speech Separation Using Fewer Microphones than Speakers," IEEE Int. Conf. Acoust. Speech Signal Proc., Hong Kong, China, vol. 1, 2003, pp. 88-91.
20 J. Woodruff and D.L. Wang, "Binaural Detection, Localization, and Segregation in Reverberant Environments Based on Joint Pitch and Azimuth Cues," IEEE Trans. Audio Speech Language Proc., vol. 21, no. 4, Apr. 2013, pp. 806-815.   DOI
21 K. Wilson, "Speech Source Separation by Combining Localization Cues with Mixture Models of Speech Spectra," IEEE Int. Conf. Acoust. Speech Signal Proc., Honolulu, Hawaii, USA, vol. 1, Apr. 15-21, 2007, pp. 33-36.
22 S. Rickard, R. Balan, and J. Rosca, "Real-Time Time-Frequency Based Blind Source Separation," ICA, San Diego, CA, USA, 2001.
23 R. Le Bouquin and G. Faucon, "Using the Coherence Function for Noise Reduction," IEE Proc. Commun. Speech Vis., vol. 139, no. 3, June 1992, pp. 276-280.   DOI
24 D. Mahmoudi and A. Drygajlo, "Wavelet Transform Based Coherence Function for Multi-channel Speech Enhancement," Euro. Signal Proc. Conf., Island of Rhodes, Greece, 1998.
25 Q.H. Pham and P. Sovka, "A Family of Coherence-Based Multimicrophone Speech Enhancement Systems," Radio Eng., vol. 12, no. 2, 2003, pp. 23-29.
26 N. Yousefian and P.C. Loizou, "A Dual-Microphone Speech Enhancement Algorithm Based on the Coherence Function," IEEE Trans. Audio Speech Language Proc., vol. 20, no. 2, Feb. 2012, pp. 599-609.
27 N. Yousefian, M. Rahmani, and A. Akbari, "Power Level Difference as a Criterion for Speech Enhancement," ICASSP, Taipei, Taiwan, Apr. 19-24, 2009, pp. 4653-4656.
28 B. Zamani, M. Rahmani, and A. Akbari, "Residual Noise Control for Coherence Based Dual Microphone Speech Enhancement," Int. Conf. Comp. Elect. Eng., Phuket, Thailand, Dec. 20-22, 2008, pp. 601-605.
29 M. Rahmani, A. Akbari, and B. Ayad, "An Iterative Noise Cross-PSD Estimation for Two-Microphone Speech Enhancement," Appl. Acoust., vol. 70, no. 3, Mar. 2009, pp. 514-521.   DOI
30 M. Rahmani et al., "Noise Cross PSD Estimation Using Phase Information in Diffuse Noise Field," Signal Proc., vol. 89, no. 5, May 2009, pp. 703-709.   DOI
31 M. Jeub et al., "Blind Estimation of the Coherent-to-Diffuse Energy Ratio from Noisy Speech Signals," EUSIPCO, Barcelona, Spain, 2011.
32 O. Thiergart, G. Del Galdo, and E.A. Habets, "On the Spatial Coherence in Mixed Sound Fields and its Application to Signalto-Diffuse Ratio Estimation," J. Acoust. Soc. America, vol. 132, no. 4, 2012, pp. 2337-2346.   DOI
33 P. Aarabi and S. Guangji, "Phase-Based Dual-Microphone Robust Speech Enhancement," IEEE Trans. Syst. Man Cybern. Part B: Cybern., vol. 34, no. 4, Aug. 2004, pp. 1763-1773.   DOI
34 J.S. Garofolo et al., "TIMIT Acoustic-Phonetic Continuous Speech Corpus," Linguistic Data Consortium, 1993.
35 J.B. Allen and D.A. Berkley, "Image Method for Efficiently Simulating Small-Room Acoustics," J. Acoust. Soc. America, vol. 65, no. 4, 1979, pp. 943-950.   DOI   ScienceOn
36 C. Knapp and G. Carter, "The Generalized Correlation Method for Estimation of Time Delay," IEEE Trans. Acoust. Speech Signal Proc., vol. 24, no. 4, Aug. 1976, pp. 320-327.   DOI
37 N. Li and P.C. Loizou, "Factors Influencing Intelligibility of Ideal Binary-Masked Speech: Implications for Noise Reduction," J. Acoust. Soc. America, vol. 123, no. 3, 2008, pp. 1673-1682.   DOI
38 U. Kjems et al., "Role of Mask Pattern in Intelligibility of Ideal Binary-Masked Noisy Speech," J. Acoust. Soc. America, vol. 126, no. 3, 2009, pp. 1415-1426.   DOI
39 M.V. Segbroeck and H. Van Hamme, "Advances in Missing Feature Techniques for Robust Large-Vocabulary Continuous Speech Recognition," IEEE Trans. Audio Speech Language Proc., vol. 19, no. 1, Jan. 2011, pp. 123-137.   DOI
40 E. Paajanen and V.V. Mattila, "Improved Objective Measures for Characterization of Noise Suppression Algorithms," IEEE Workshop Speech Coding, Tsukuba, Japan, Oct. 2002, pp. 77-79.
41 ITU-T Recommendation P.862, Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs, 2001.
42 Y. Li and D.L. Wang, "On the Optimality of Ideal Binary Time-Frequency Masks," Speech Commun., vol. 51, no. 3, Mar. 2009, pp. 230-239.   DOI
43 J.R. Quinlan, C4.5: Programs for Machine Learning, 1st ed., San Francisco, CA, USA: Morgan Kaufmann, 1993.
44 G. Kim et al., "An Algorithm that Improves Speech Intelligibility in Noise for Normal-Hearing Listeners," J. Acoust. Soc. America, vol. 126, no. 3, 2009, pp. 1486-1494.   DOI
45 R. Martin, "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics," IEEE Trans. Speech Audio Proc., vol. 9, no. 5, July 2001, pp. 504-512.   DOI   ScienceOn
46 Y. Hu and P.C. Loizou, "Subjective Comparison and Evaluation of Speech Enhancement Algorithms," Speech Commun., vol. 49, no. 7-8, 2007, pp. 588-601.   DOI