DOI QR코드

DOI QR Code

Speech Enhancement Using Phase-Dependent A Priori SNR Estimator in Log-Mel Spectral Domain

  • Received : 2014.01.29
  • Accepted : 2014.06.16
  • Published : 2014.10.01

Abstract

We propose a novel phase-based method for single-channel speech enhancement to extract and enhance the desired signals in noisy environments by utilizing the phase information. In the method, a phase-dependent a priori signal-to-noise ratio (SNR) is estimated in the log-mel spectral domain to utilize both the magnitude and phase information of input speech signals. The phase-dependent estimator is incorporated into the conventional magnitude-based decision-directed approach that recursively computes the a priori SNR from noisy speech. Additionally, we reduce the performance degradation owing to the one-frame delay of the estimated phase-dependent a priori SNR by using a minimum mean square error (MMSE)-based and maximum a posteriori (MAP)-based estimator. In our speech enhancement experiments, the proposed phase-dependent a priori SNR estimator is shown to improve the output SNR by 2.6 dB for both the MMSE-based and MAP-based estimator cases as compared to a conventional magnitude-based estimator.

Keywords

References

  1. H.J. Song, Y.K. Lee, and H.S. Kim, "Probabilistic Bilinear Transformation Space-Based Joint Maximum A Posteriori Adaptation," ETRI J., vol. 34, no. 5, Oct. 2012, pp. 783-786. https://doi.org/10.4218/etrij.12.0212.0054
  2. S.J. Lee et al., "Intra- and Inter-Frame Features for Automatic Speech Recognition," ETRI J., vol. 36, no. 3, June 2014, pp. 514-517. https://doi.org/10.4218/etrij.14.0213.0181
  3. Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum-Mean Square Error Short-Time Spectral Amplitude Estimator," IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, Dec. 1984, pp. 1109-1121. https://doi.org/10.1109/TASSP.1984.1164453
  4. P.C. Loizou, "Part II Algorithms," in Speech Enhancement, CRC Press, 2007, pp. 97-289.
  5. M.J. Alam, D. O'Shaughnessy, and S.-A. Selouani, "Speech Enhancement Based on Novel Two-Step A Priori SNR Estimators," Proc. INTERSPEECH, Brisbane, Australia, Sept. 2008, pp. 565-568.
  6. D.L. Wang and J.S. Lim, "The Unimportance of Phase in Speech Enhancements," IEEE Trans. Acoust., Speech, Signal Process., vol. 30, no. 4, Aug. 1982, pp. 679-681. https://doi.org/10.1109/TASSP.1982.1163920
  7. F. Faubel, J. Mcdonough, and D. Klakow, "A Phase-Averaged Model for the Relationship between Noisy Speech, Clean Speech, and Noise in the Log-Mel Domain," Proc. INTERSPEECH, Brisbane, Australia, Sept. 2008, pp. 553-556.
  8. L. Deng, J. Droppo, and A. Acero, "Enhancement of Log Mel Power Spectra of Speech Using a Phase-Sensitive Model of the Acoustic Environment and Sequential Estimation of the Corrupting Noise," IEEE Trans. Speech Audio Process., vol. 12, no. 2, Mar. 2004, pp. 133-143. https://doi.org/10.1109/TSA.2003.820201
  9. K.K. Paliwal, "Usefulness of Phase in Speech Processing," Proc. IPSJ Spoken Language Process. Workshop, Gifu, Japan, 2003, pp. 1-6.
  10. Y.-K. Lee, I.S. Lee, and O.-W. Kwon, "Single-Channel Speech Separation Using Phase-Based Methods," IEEE Trans. Consum. Electron., vol. 56, no. 4, Nov. 2010, pp. 2453-2459. https://doi.org/10.1109/TCE.2010.5681127
  11. Y.-K. Lee and O.-W. Kwon, "A Phase-Dependent A Priori SNR Estimator in the Log-Mel Spectral Domain for Speech Enhancement," IEEE Int. Conf. Consum. Electron., Las Vegas, NV, USA, Jan. 9-12, 2011, pp. 413-414.
  12. B. Andrassy, D. Vlaj, and C. Beaugeant, "Recognition Performance of the Siemens Front-End with and without Frame Dropping on the Aurora 2 Database," Proc. European Conf. Speech Commun. Technol., vol. 1, 2001, pp. 193-196.
  13. S. Sigurdsson, K.B. Petersen, and T. Lehn-Schiole, "Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music," Proc. Int. Conf. Music Inf. Retrieval, Victoria, Canada, Oct. 2006.
  14. M. Kato, A. Sugiyama, and M. Serizawa, "Noise Suppression with High Speech Quality Based on Weighted Noise Estimation and MMSE STSA," IEICE Trans. Fundam., vol. E85-A, no. 7, July 2002, pp. 1710-1718.
  15. A.V. Oppenheim and R.W. Schaefer, Digital Signal Processing, Englewood Cliffs, NJ: Prentice-Hall, 1989.
  16. M.P. Cooke et al., "An Audio-Visual Corpus for Speech Perception and Automatic Speech Recognition," J. Acoust. Soc. America, vol. 120, no. 5, Nov. 2006, pp. 2421-2424. https://doi.org/10.1121/1.2229005

Cited by

  1. Hard component detection of transient noise and its removal using empirical mode decomposition and wavelet‐based predictive filter vol.12, pp.7, 2014, https://doi.org/10.1049/iet-spr.2017.0167