DOI QR코드

DOI QR Code

Voice Activity Detection Algorithm Based on the Power Spectral Deviation of Teager Energy in Noisy Environment

잡음환경에서 Teager 에너지의 전력 스펙트럼 편차에 기반한 음성 검출 알고리즘

  • Received : 2011.07.08
  • Accepted : 2011.09.08
  • Published : 2011.10.31

Abstract

In this paper, we propose a novel voice activity detection (VAD) algorithm to effectively distinguish speech from nonspeech in various noisy environments. The presented VAD utilizes the power spectral deviation (PSD) based on Teager energy (TE) instead of the conventional PSD scheme to improve the performance of decision for speech segments. In addition, the speech absence probability (SAP) is derived in each frequency subband to modify the PSD for further VAD. Performances of the proposed VAD algorithm are evaluated by objective test under various environments and better results compared with the conventional methods are obtained.

본 논문에서는 잡음환경에서 효과적인 음성을 검출하기 위한 새로운 음성검출 (VAD, voice activity detection) 알고리즘을 제안한다. 제안된 방법은 개선된 음성/비음성 검출을 위해 기존의 파워 스펙트럼 편차를 적용하는 대신 Teager 에너지 기반의 파워 스펙트럼 편차 (power spectral deviation)를 이용한다. 또한 향상된 VAD 성능을 위하여 각각의 주파수 밴드에 대한 음성부재확률 (speech absence probability)을 제안된 파워 스펙트럼 편차를 도출하는데 스무딩 (smoothing) 파라미터로 적용한다. 제안된 알고리즘은 기존의 방법과 객관적인 실험을 통해 비교 평가한 결과 다양한 배경잡음 환경에서 향상된 성능을 보였다.

Keywords

References

  1. L. Karray, C. Mokbel and J. Monne, "Solutions for robust. speech/non-speech detection in wireless environment," presented at the IVTTA, 1988.
  2. L. R. Rabiner and M. R. Sambur, "Voiced-unvoiced-silence detection using the Itakura LPC distance measure," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., pp. 323-326, 1977.
  3. F. Jabloun, A. E. Cetin and E. Erzin, "Teager energy based feature parameters for speech recognition in car noise," IEEE Signal Processing Letters, vol. 6, pp. 259-261, 1999. https://doi.org/10.1109/97.789604
  4. K. C. Wang and Y. H. Tsai, "Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy," Second International Symposium on Universal Communication 2008, pp. 423-428, 2008.
  5. TIA/EIA/IS-127, Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems, 1996.
  6. R. J. McAualy and M. L. Malpass, "Speech enhancement using a soft-decision noise suppression filter," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, pp. 137-145, 1980.
  7. J. Sohn, W. Sung, "A voice activity detector employing soft decision based noise spectrum adaptation," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 365-368, 1998.
  8. 박윤식, 장준혁, "강인한 음성향상을 위한 Minimum Statistics와 Soft Decision의 확률적 결합의 새로운 잡음전력 추정기법," 한국음향학회지, 26권, 4호, 153-158쪽, 2007.
  9. Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109-1121, 1984.
  10. Rix, A. W., Beerends, J. G., Hollier, M. P. and Hekstra, A. P. "Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2, pp. 749-752, 2001.
  11. Yi Hu and P. C. Loizou, "Evaluation of objective quality measures for speech enhancement," IEEE Trans. ASLP, vol. 16, pp. 229-238, 2008.
  12. J. Sohn, N. S. Kim and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Processing Letters, vol. 6, no. 1, pp. 1-3, 1999.