Browse > Article
http://dx.doi.org/10.7776/ASK.2015.34.4.310

A Parametric Voice Activity Detection Based on the SPD-TE for Nonstationary Noises  

Koo, Boneung (Department of Electronic Engineering, Kyonggi University)
Abstract
A single channel VAD (Voice Activity Detection) algorithm for nonstationary noise environment is proposed in this paper. Threshold values of the feature parameter for VAD decision are updated adaptively based on estimates of means and standard deviations of past non-speech frames. The feature parameter, SPD-TE (Spectral Power Difference-Teager Energy), is obtained by applying the Teager energy to the WPD (Wavelet Packet Decomposition) coefficients. It was reported previously that the SPD-TE is robust to noise as a feature for VAD. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that decision accuracy of the proposed algorithm is comparable to several typical VAD algorithms including standards for SNR values ranging from 10 to -10 dB.
Keywords
Voice activity detection; Speech pause detection; Nonstationary noise; Noise-robustness; Single channel;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 P. C. Loizou, Speech Enhancement (CRC Press, Boca Raton, 2007), pp. 309-400.
2 J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett. 16, 1-3 (1999).
3 ITU, A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70, ITU-T Recommendation G.729-Annex B (1996).
4 ETSI EN 301 708 V7.1.1(1999-12), Digital cellular telecommunications system(Phase 2+); VAD for AMR speech traffic channels; General Description (GSM 06.94 version 7.1.1 Release 1998), 13-14 (1999).
5 ETSI ES 202 050, Ver. 1.1.5(2007-01), Speech Processing, Transmission and Quality Aspects(STQ); Distributed Speech Recognition; Advanced front-end feature extraction algorithm; Compression algorithms, Annex A.3 Stage 2-VAD Logic, 42-43 (2007).
6 J. Ramirez, J. C. Segura, C. Benitez, A. Torre, and A. Rubio, "Efficient voice activity detection algorithms using longterm speech information," Speech Commun. 42, 271-287 (2004).   DOI   ScienceOn
7 A. Davis, S. Nordholm, and R. Togneri, "Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold," IEEE Trans. Audio, Speech and Lang. Processing 14, 412-414 (2006).   DOI   ScienceOn
8 G. Evangelopoulos and P. Maragos, "Multiband modulation energy tracking for noisy speech detection," IEEE Trans. Audio, Speech and Lang. Processing 14, 2024-2038 (2006).   DOI   ScienceOn
9 T. V. Pham and T. T. Chien, "Reliable voice activity detection algorithm under adverse environments," in Proc. IEEE Int. Conf. Commun. Electronics, 218-223 (2008).
10 P. K. Ghosh and S. Narayanan, "Robust voice activity detection using long-term signal variability," IEEE Trans. Audio, Speech and Lang. Processing 19, 600-613 (2011).   DOI   ScienceOn
11 E. Chuangsuwanich and J. Glass, "Robust voice activity detector for real world application using harmonicity and modulation frequency," in Proc. Interspeech, 2645-2648 (2011).
12 B. Koo, "A single channel voice activity detection for noisy environments using wavelet packet decomposition and Teager energy" (in Korean), J. Acoust. Soc. Kr. 33, 139-145 (2014).   DOI   ScienceOn
13 J. Garofolo, "TIMIT acoustic-phonetic continuous speech corpus," LDC93S1, Linguistic Data Consortium, Philadelphia, 1993.
14 A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: An additive noise on speech recognition systems," Speech Commun. 12, 247-251 (1993).   DOI   ScienceOn