DOI QR코드

DOI QR Code

Statistical Model-Based Voice Activity Detection Based on Second-Order Conditional MAP with Soft Decision

  • Received : 2011.06.01
  • Accepted : 2011.10.19
  • Published : 2012.04.04

Abstract

In this paper, we propose a novel approach to statistical model-based voice activity detection (VAD) that incorporates a second-order conditional maximum a posteriori (CMAP) criterion. As a technical improvement for the first-order CMAP criterion in [1], we consider both the current observation and the voice activity decision in the previous two frames to take full consideration of the interframe correlation of voice activity. This is clearly different from the previous approach [1] in that we employ the voice activity decisions in the second-order (previous two frames) CMAP, which has quadruple thresholds with an additional degree of freedom, rather than the first-order (previous single frame). Also, a soft-decision scheme is incorporated, resulting in time-varying thresholds for further performance improvement. Experimental results show that the proposed algorithm outperforms the conventional CMAP-based VAD technique under various experimental conditions.

Keywords

References

  1. J.W. Shin et al., "Voice Activity Detection Based on Conditional MAP Criterion," IEEE Signal Proc. Lett., vol. 15, Feb. 2008, pp. 257-260. https://doi.org/10.1109/LSP.2008.917027
  2. L.R. Rabiner and M.R. Sambur, "Voiced-Unvoiced-Silence Detection Using the Itakura LPC Distance Measure," Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., May 1977, pp. 323- 326.
  3. J.A. Haigh and J.S. Mason, "Robust Voice Activity Detection Using Cepstral Features," Proc. IEEE TENCON, vol. 3, Oct. 1993, pp. 321-324.
  4. K. Srinivasant and A. Gersho, "Voice Activity Detection for Cellular Networks," Proc. IEEE Works. Speech Coding Telecommu., Oct. 1993, pp. 85-86.
  5. Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Trans. Acoustics, Speech, Signal Process., vol. ASSP-32, no. 6, Dec. 1984, pp. 1109-1121.
  6. Y.D. Cho, K. Al-Naimi, and A. Kondoz, "Improved Voice Activity Detection Based on a Smoothed Statistical Likelihood Ratio," Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Process., vol. 2, May 2001, pp. 737-740.
  7. J. Sohn, N.S. Kim, and W. Sung, "A Statistical Model-Based Voice Activity Detection," IEEE Signal Proc. Lett., vol. 6, no. 1, Jan. 1999, pp. 1-3.
  8. J.-H. Chang, N.S. Kim, and S.K. Mitra, "Voice Activity Detection Based on Multiple Statistical Models," IEEE Trans. Signal Process., vol. 54, no. 6, June 2006, pp. 1965-1976. https://doi.org/10.1109/TSP.2006.874403
  9. J. Ramirez et al, "Statistical Voice Activity Detection Using a Multiple Observation Likelihood Ratio Test," IEEE Signal Process. Lett., vol. 12, no. 10, Oct. 2005, pp. 689-692. https://doi.org/10.1109/LSP.2005.855551
  10. J.-H. Chang, J.W. Shin, and N.S. Kim, "Likelihood Ratio Test with Complex Laplacian Model for Voice Activity Detection," Proc. Eurospeech, Aug. 2003, pp. 1065-1068.
  11. J.-H. Chang et al., "Global Soft Decision Employing Support Vector Machine for Speech Enhancement," IEEE Signal Proc. Lett., vol. 16, no. 1, Jan. 2009, pp. 57-60. https://doi.org/10.1109/LSP.2008.2008574
  12. P.C. Loizou, Speech Enhancement: Theory and Practice, CRC Press, 2007.
  13. ITU-T, "A Silence Compression Scheme for G.729 Optimised for Terminals Conforming to Recommendation V.70," ITU-T Rec. G.729, Annex B, 1996.

Cited by

  1. iVisher: Real-Time Detection of Caller ID Spoofing vol.36, pp.5, 2014, https://doi.org/10.4218/etrij.14.0113.0798