Browse > Article
http://dx.doi.org/10.7236/IJIBC.2019.11.4.26

Voice Activity Detection Based on SNR and Non-Intrusive Speech Intelligibility Estimation  

An, Soo Jeong (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology)
Choi, Seung Ho (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology)
Publication Information
International Journal of Internet, Broadcasting and Communication / v.11, no.4, 2019 , pp. 26-30 More about this Journal
Abstract
This paper proposes a new voice activity detection (VAD) method which is based on SNR and non-intrusive speech intelligibility estimation. In the conventional SNR-based VAD methods, voice activity probability is obtained by estimating frame-wise SNR at each spectral component. However these methods lack performance in various noisy environments. We devise a hybrid VAD method that uses non-intrusive speech intelligibility estimation as well as SNR estimation, where the speech intelligibility score is estimated based on deep neural network. In order to train model parameters of deep neural network, we use MFCC vector and the intrusive speech intelligibility score, STOI (Short-Time Objective Intelligent Measure), as input and output, respectively. We developed speech presence measure to classify each noisy frame as voice or non-voice by calculating the weighted average of the estimated STOI value and the conventional SNR-based VAD value at each frame. Experimental results show that the proposed method has better performance than the conventional VAD method in various noisy environments, especially when the SNR is very low.
Keywords
Voice Activity Detection (VAD); SNR-based VAD; Non-intrusive speech intelligibility estimation; STOI; Deep neural network;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Processing Letters, vol. 6, issue 1, pp. 1-3, Jan. 1999. DOI: https://www.doi.org/10.1109/97.736233   DOI
2 M. Vondrasek and P. Pollak, "Methods for Speech SNR estimation: Evaluation Tool and Analysis of VAD Dependency," Radioengineering 14(1), April 2005 DOI: https://doaj.org/article/a53fe518a9634318b417fb15a8c37fa8
3 C.H. Taal, R.C. Hendrilks, R. Heusdens, and J. Jensen, "An algorithm for intelligibility prediction of time frequency weighted noisy speech," IEEE Transactions on Audio, Speech, and Language Processing, vol.19, no.7, pp.2125-2136, 2011. DOI: https://www.doi.org/10.1109/TASL.2011.2114881   DOI
4 D. K. Yun, H. N. Lee, and S. H. Choi, "A Deep Learning-Based Approach to Non-Intrusive Speech Intelligibility Estimation," IEICE Trans. Information and Systems, pp. 1207-1208, Apr. 2018. DOI: https://www.doi.org/10.1587/transinf.2017EDL8225   DOI
5 Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 6, Dec. 1984 DOI: https://www.doi.org/10.1109/TASSP.1984.1164453
6 R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics", IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, July 2001 DOI: https://www.doi.org/10.1109/89.928915
7 S. Molau, M. Pitz, R. Schluter, and H. Ney, "Computing mel-frequency cepstral coefficients on the power spectrum", IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 73-76, May 2001 DOI: https://www.doi.org/10.1109/ICASSP.2001.940770
8 V. Nair and G. E. Hinton, "Rectified linear units improve restricted Boltzmann machines", Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010. DOI: https://dl.acm.org/citation.cfm?id=3104425
9 D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization", arXiv preprint arXiv: 1412.6980, 2014. DOI: https://arxiv.org/abs/1412.6980
10 Multi-lingual speech database for telephonometry (1994). [Online]. Available: http://www.ntt-at.com/product/speech/. NTT Adv. Technol. Corp. Accessed 18 April 2016.