[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4218/etrij.11.1510.0158

A Weighted Feature Voting Approach for Robust and Real-Time Voice Activity Detection

Moattar, Mohammad Hossein (Department of Computer Engineering and IT, Amirkabir University of Technology)
Homayounpour, Mohammad Mehdi (Department of Computer Engineering and IT, Amirkabir University of Technology)

Publication Information

ETRI Journal / v.33, no.1, 2011 , pp. 99-109 More about this Journal

Abstract

This paper concerns a robust real-time voice activity detection (VAD) approach which is easy to understand and implement. The proposed approach employs several short-term speech/nonspeech discriminating features in a voting paradigm to achieve a reliable performance in different environments. This paper mainly focuses on the performance improvement of a recently proposed approach which uses spectral peak valley difference (SPVD) as a feature for silence detection. The main issue of this paper is to apply a set of features with SPVD to improve the VAD robustness. The proposed approach uses a weighted voting scheme in order to take the discriminative power of the employed feature set into account. The experiments show that the proposed approach is more robust than the baseline approach from different points of view, including channel distortion and threshold selection. The proposed approach is also compared with some other VAD techniques for better confirmation of its achievements. Using the proposed weighted voting approach, the average VAD performance is increased to 89.29% for 5 different noise types and 8 SNR levels. The resulting performance is 13.79% higher than the approach based only on SPVD and even 2.25% higher than the not-weighted voting scheme.

Keywords

Spectral peak valley difference; spectral flatness; pitch; noise robustness; weighted voting; voice activity detection;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)
Times Cited By Web Of Science : 2 (Related Records In Web of Science)
Times Cited By SCOPUS : 2

1	I.C. Yoo and D. Yook, "Robust Voice Activity Detection Using the Spectral Peaks of Vowel Sounds," ETRI J., vol. 31, no. 4, 2009, pp. 451-453 DOI
2	S. Shafiee et al., "A Two-Stage Speech Activity Detection System Considering Fractal Aspects of Prosody," Pattern Recog. Lett., 2010.
3	M. Fujimoto and K. Ishizuka, "Noise Robust Voice Activity Detection Based on Switching Kalman Filter," IEICE Trans. Inf. Syst., 2008, E91-D, pp. 467-477. DOI ScienceOn
4	A. Agarwal and Y.M. Cheng, "Two-Stage Mel-Warped Wiener Filter for Robust Speech Recognition," IEEE Workshop Auto. Speech Recog. Understanding, 1999, pp. 67-70.
5	B.F. Wu and K.C. Wang, "Robust Endpoint Detection Algorithm Based on the Adaptive Band Partitioning Spectral Entropy in Adverse Environments," IEEE Trans. Speech Audio Process., vol. 13, 2005, pp. 762-775. DOI
6	J.L. Shen, J.W. Hung, and L.S. Lee, "Robust Entropy Based Endpoint Detection for Speech Recognition in Noisy Environments," ICSP, 1998, pp. 232-235.
7	S. Ahmadi and A.S. Spanias, "Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm," IEEE Trans. Speech Audio Process., vol. 7, 1999, pp. 333-338. DOI ScienceOn
8	Y. Tian, Z. Wang, and D. Lu, "Non-Speech Segment Rejection Based on Prosodic Information for Robust Speech Recognition," IEEE Signal Process. Lett., vol. 9, no. 11, 2002, pp. 364-367. DOI
9	K. Ishizuka et al., "Noise Robust Voice Activity Detection Based on Periodic to Aperiodic Component Ratio," Speech Commun., vol. 52, 2010, pp. 41-60. DOI ScienceOn
10	R.E. Yantorno, K.L. Krishnamachari, and J.M. Lovekin, "The Spectral Autocorrelation Peak Valley Ratio (SAPVR): A Usable Speech Measure Employed as a Co-channel Detection System," IEEE Int. Workshop Intell. Signal Process., 2001, pp. 193-197.
11	A. Benyassine et al., "ITU-T Recommendation G.729 Annex B: A Silence Compression Scheme for Use with G.729 Optimized for V.70 Digital Simultaneous Voice and Data Applications," IEEE Commun. Mag., vol. 35, 1997, pp. 64-73.
12	M. Marzinzik and B. Kollmeier, "Speech Pause Detection for Noise Spectrum Estimation by Tracking Power Envelope Dynamics," IEEE Trans. Speech Audio Process., vol. 10, 2002, pp. 109-118. DOI ScienceOn
13	J. Ram irez et al., "Efficient Voice Activity Detection Algorithms Using Long-Term Speech Information," Speech Commun., 2004, vol. 42, pp. 271-287. DOI ScienceOn
14	M.H. Moattar, M.M. Homayounpour, and N.K. Kalantari, "A New Approach for Robust Realtime Voice Activity Detection Using Spectral Pattern," ICASSP, 2010, pp. 4478-4481.
15	M.H. Savoji, "A Robust Algorithm for Accurate End Pointing of Speech," Speech Commun., 1989, vol. 8, no. 1, pp. 45-60. DOI ScienceOn
16	T. Kristjansson, S. Deligne, and P. Olsen, "Voicing Features for Robust Speech Detection," Interspeech, 2005, pp. 369-372.
17	A.P. Varga et al., "The NOISEX-92 Study on the Effect of Additive Noise on Automatic Speech Recognition," Technical report, DRA Speech Research Unit, 1992.
18	B. Lee and M. Hasegawa-Johnson, "Minimum Mean Squared Error A Posteriori Estimation of High Variance Vehicular Noise," Biennial DSP In-Vehicle Mobile Syst., 2007.
19	ETSI, Digital Cellular Telecommunications Systems (Phase 2+); Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, GSM 06.94, version 7.1.1, EN 301 708, 1999.
20	ETSI, Speech Processing, Transmission, and Quality Aspects (STQ), Distributed Speech Recognition, Advanced Front-End Feature Extraction Algorithm, Compression Algorithms, version 1.1.1, ES 202 050, 2001.
21	J.S. Garofalo et al., DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CDROM, Linguistic Data Consortium, 1993.
22	M. Bijankhan and M.J. Sheikhzadegan, "FARSDAT- the Farsi Spoken Language Database," 5th Australian Int. Conf. Speech Sci. Technol., 1994, vol. 2, pp. 826-829.
23	M.H. Moattar and M.M. Homayounpour, "A Simple but Efficient Real-Time Voice Activity Detection Algorithm," Eusipco, 2009, pp. 2549-2553.
24	H.G. Hirsch and D. Pearce, "The AURORA Experimental Framework for the Performance Evaluation of Speech Recognition Systems under Noise Conditions," ISCA ITRW, 2000, pp. 181-188.
25	D. Cournapeau and T. Kawahara, "Evaluation of Real-Time Voice Activity Detection Based on High Order Statistics," Interspeech, 2007, pp. 2945-2949.
26	H. Kato Solvang, K. Ishizuka, and M. Fujimoto, "Voice Activity detection Based on Adjustable Linear Prediction and GARCH Models," Speech Commun., 2008, vol. 50, pp. 476-486. DOI ScienceOn

None	(2011) The Scientific World Journal A Hierarchical Framework Approach for Voice Activity Detection and Speech Enhancement / 2014 (None) , 723643
3	(2014) International journal of speech technology Manifold learning based speaker dependent dimension reduction for robust text independent speaker verification / 17 (3) , 271
12	(2011) IEEE/ACM transactions on audio, speech, and language processing Formant-Based Robust Voice Activity Detection / 23 (12) , 2238
8	(2011) IET signal processing Efficient harmonic peak detection of vowel sounds for enhanced voice activity detection / 12 (8) , 975

1	Speaker Tracking Using Eigendecomposition and an Index Tree of Reference Models / [Moattar, Mohammad Hossein;Homayounpour, Mohammad Mehdi;] / ETRI Journal
2	Text-Independent Speaker Verification Using Variational Gaussian Mixture Model / [Moattar, Mohammad Hossein;Homayounpour, Mohammad Mehdi;] / ETRI Journal