[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7776/ASK.2014.33.2.139

A Single Channel Voice Activity Detection for Noisy Environments Using Wavelet Packet Decomposition and Teager Energy

Koo, Boneung (경기대학교 전자공학과)

Publication Information

The Journal of the Acoustical Society of Korea / v.33, no.2, 2014 , pp. 139-145 More about this Journal

Abstract

In this paper, a feature parameter is obtained by applying the Teager energy to the WPD(Wavelet Packet Decomposition) coefficients. The threshold value is obtained based on means and standard deviations of nonspeech frames. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that the proposed algorithm is superior to the typical VAD algorithm. The ROC(Receiver Operating Characteristics) curves are used to compare performance of VAD's for SNR values of ranging from 10 to -10 dB.

Keywords

Voice activity detection; Speech pause detection; Teager energy; Wavelet packet decomposition; Noise-robustness; Single channel;

Citations & Related Records

Reference

1	P. C. Loizou, Speech Enhancement (CRC Press, Boca Raton, 2007), pp. 309-400.
2	K. Ishizuka, T. Nakatani and N. Miyazaki, "Noise robust voice activity detection based on periodic to aperiodic component ratio," Speech Commun.52, 41-60 (2010). DOI ScienceOn
3	D. Ying, Y. Yan, J. Dang and F. K. Soong, "Voice activity detection based on an unsupervised learning network," IEEE Trans. Audio, Speech, and Lang. Processing, 19, 2624-2628 (2011). DOI ScienceOn
4	T. Kristjansson, S. Deligne and P. Olsen, "Voicing features for speech detection," in Proc. Interspeech, 369-372 (2005).
5	J-H Bach, B. Kollmeier and J. Anemuller, "Modulationbased detection of speech in real background noise: Generalization to novel background classes," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. 41-44 (2010).
6	M. Marzinzik and B. Kollmeier, "Speech pause detection for noise spectrum estimation by tracking power envelope dynamics," IEEE Trans. Speech and Audio Process. 10, 109-118 (2002) DOI ScienceOn
7	E. Chuangsuwanich and J. Glass, "Robust voice activity detector for real world application using harmonicity and modulation frequency," in Proc. Interspeech, 2645-2648 (2011).
8	J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett. 16, 1-3 (1999).
9	F. Beritelli, S. Casale and G. Ruggeri, "Performance evaluation and comparison of ITU-T/ETSI voice activity detectors," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. 3, 1425-1428 (2001).
10	J. Ramirez, J. C. Segura, C. Benitez, A, Torre and A. Rubio, "Efficient voice activity detection algorithms using long-term speech information," Speech Commun. 42, 271-287 (2004). DOI ScienceOn
11	A. Davis, S. Nordholm and R. Togneri, "Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold," IEEE Trans. Audio, Speech, and Lang. Processing, 14, 412-414 (2006). DOI ScienceOn
12	G. Evangelopoulos and P. Maragos, "Multiband modulation energy tracking for noisy speech detection," IEEE Trans. Audio, Speech and Lang. Processing, 14, 2024-2038 (2006). DOI ScienceOn
13	T. V. Pham and T. T. Chien, "Reliable voice activity detection algorithm under adverse environments," in Proc. IEEE Int. Conf. Commun. Electronics, 218-223 (2008).
14	M. Bahoura and J. Rouat, "Wavelet speech enhancement based on the Teager energy operator," IEEE Signal Process. Lett. 8, 10-12 (2001). DOI ScienceOn
15	P. K. Ghosh and S. Narayanan, "Robust voice activity detection using long-term signal variability," IEEE Trans. Audio, Speech and Lang. Processing, 19, 600-613 (2011). DOI ScienceOn
16	James F. Kaiser, "On a simple algorithm to calculate the 'energy' of a signal," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. S7.3, 381-384 (1990).
17	F. Jabloun, A. E. Cetin and E. Erzin, "Teager energy based feature parameters for speech recognition in car noises," IEEE Signal Process. Lett.. 6, 259-261 (1999). DOI ScienceOn
18	K. B. Eung, "An Experimental Study on the Robustness of the Teager Energy to the Car Noise," (in Korean), Inst. of Industrial Technology Journal, Kyonggi University, 39, 43-56 (2011).
19	A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: An additive noise on speech recognition systems," Speech Commun.12, 247-251 (1993). DOI ScienceOn
20	ETSI EN 301 708 V7.1.1(1999-12), Digital cellular telecommunications system(Phase 2+); VAD for AMR speech traffic channels; General Description (GSM 06.94 version 7.1.1 Release 1998), 13-14 (1999).
21	ETSI ES 202 050, Ver. 1.1.5(2007-01), Speech Processing, Transmission and Quality Aspects(STQ); Distributed Speech Recognition; Advanced front-end feature extraction algorithm; Compression algorithms, Annex A.3 Stage 2-VAD Logic, 42-43 (2007).
22	J. S. Garofolo, "TIMIT acoustic-phonetic continuous speech corpus," Linguistic Data Consortium, Philadelphia, (1993).

KSCI

A Single Channel Voice Activity Detection for Noisy Environments Using Wavelet Packet Decomposition and Teager Energy 웨이블렛 패킷 변환과 Teager 에너지를 이용한 잡음 환경에서의 단일 채널 음성 판별

A Single Channel Voice Activity Detection for Noisy Environments Using Wavelet Packet Decomposition and Teager Energy