Browse > Article
http://dx.doi.org/10.7776/ASK.2014.33.2.139

A Single Channel Voice Activity Detection for Noisy Environments Using Wavelet Packet Decomposition and Teager Energy  

Koo, Boneung (경기대학교 전자공학과)
Abstract
In this paper, a feature parameter is obtained by applying the Teager energy to the WPD(Wavelet Packet Decomposition) coefficients. The threshold value is obtained based on means and standard deviations of nonspeech frames. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that the proposed algorithm is superior to the typical VAD algorithm. The ROC(Receiver Operating Characteristics) curves are used to compare performance of VAD's for SNR values of ranging from 10 to -10 dB.
Keywords
Voice activity detection; Speech pause detection; Teager energy; Wavelet packet decomposition; Noise-robustness; Single channel;
Citations & Related Records
연도 인용수 순위
  • Reference
1 P. C. Loizou, Speech Enhancement (CRC Press, Boca Raton, 2007), pp. 309-400.
2 K. Ishizuka, T. Nakatani and N. Miyazaki, "Noise robust voice activity detection based on periodic to aperiodic component ratio," Speech Commun.52, 41-60 (2010).   DOI   ScienceOn
3 D. Ying, Y. Yan, J. Dang and F. K. Soong, "Voice activity detection based on an unsupervised learning network," IEEE Trans. Audio, Speech, and Lang. Processing, 19, 2624-2628 (2011).   DOI   ScienceOn
4 T. Kristjansson, S. Deligne and P. Olsen, "Voicing features for speech detection," in Proc. Interspeech, 369-372 (2005).
5 J-H Bach, B. Kollmeier and J. Anemuller, "Modulationbased detection of speech in real background noise: Generalization to novel background classes," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. 41-44 (2010).
6 M. Marzinzik and B. Kollmeier, "Speech pause detection for noise spectrum estimation by tracking power envelope dynamics," IEEE Trans. Speech and Audio Process. 10, 109-118 (2002)   DOI   ScienceOn
7 E. Chuangsuwanich and J. Glass, "Robust voice activity detector for real world application using harmonicity and modulation frequency," in Proc. Interspeech, 2645-2648 (2011).
8 J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett. 16, 1-3 (1999).
9 F. Beritelli, S. Casale and G. Ruggeri, "Performance evaluation and comparison of ITU-T/ETSI voice activity detectors," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. 3, 1425-1428 (2001).
10 J. Ramirez, J. C. Segura, C. Benitez, A, Torre and A. Rubio, "Efficient voice activity detection algorithms using long-term speech information," Speech Commun. 42, 271-287 (2004).   DOI   ScienceOn
11 A. Davis, S. Nordholm and R. Togneri, "Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold," IEEE Trans. Audio, Speech, and Lang. Processing, 14, 412-414 (2006).   DOI   ScienceOn
12 G. Evangelopoulos and P. Maragos, "Multiband modulation energy tracking for noisy speech detection," IEEE Trans. Audio, Speech and Lang. Processing, 14, 2024-2038 (2006).   DOI   ScienceOn
13 T. V. Pham and T. T. Chien, "Reliable voice activity detection algorithm under adverse environments," in Proc. IEEE Int. Conf. Commun. Electronics, 218-223 (2008).
14 M. Bahoura and J. Rouat, "Wavelet speech enhancement based on the Teager energy operator," IEEE Signal Process. Lett. 8, 10-12 (2001).   DOI   ScienceOn
15 P. K. Ghosh and S. Narayanan, "Robust voice activity detection using long-term signal variability," IEEE Trans. Audio, Speech and Lang. Processing, 19, 600-613 (2011).   DOI   ScienceOn
16 James F. Kaiser, "On a simple algorithm to calculate the 'energy' of a signal," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. S7.3, 381-384 (1990).
17 F. Jabloun, A. E. Cetin and E. Erzin, "Teager energy based feature parameters for speech recognition in car noises," IEEE Signal Process. Lett.. 6, 259-261 (1999).   DOI   ScienceOn
18 K. B. Eung, "An Experimental Study on the Robustness of the Teager Energy to the Car Noise," (in Korean), Inst. of Industrial Technology Journal, Kyonggi University, 39, 43-56 (2011).
19 A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: An additive noise on speech recognition systems," Speech Commun.12, 247-251 (1993).   DOI   ScienceOn
20 ETSI EN 301 708 V7.1.1(1999-12), Digital cellular telecommunications system(Phase 2+); VAD for AMR speech traffic channels; General Description (GSM 06.94 version 7.1.1 Release 1998), 13-14 (1999).
21 ETSI ES 202 050, Ver. 1.1.5(2007-01), Speech Processing, Transmission and Quality Aspects(STQ); Distributed Speech Recognition; Advanced front-end feature extraction algorithm; Compression algorithms, Annex A.3 Stage 2-VAD Logic, 42-43 (2007).
22 J. S. Garofolo, "TIMIT acoustic-phonetic continuous speech corpus," Linguistic Data Consortium, Philadelphia, (1993).