Browse > Article
http://dx.doi.org/10.7840/kics.2016.41.8.851

Statistical Voice Activity Detection Using Probabilistic Non-Negative Matrix Factorization  

Kim, Dong Kook (Chonnam National University, School of Electronic and Computer Engineering)
Shin, Jong Won (Gwangju Institute of Science and Technology, School of Electrical Engineering and Computer Science)
Kwon, Kisoo (Seoul National University, Department of Electrical and Computer Engineering and the Institute of New Media and Communications)
Kim, Nam Soo (Seoul National University, Department of Electrical and Computer Engineering and the Institute of New Media and Communications)
Abstract
This paper presents a new statistical voice activity detection (VAD) based on the probabilistic interpretation of nonnegative matrix factorization (NMF). The objective function of the NMF using Kullback-Leibler divergence coincides with the negative log likelihood function of the data if the distribution of the data given the basis and encoding matrices is modeled as Poisson distributions. Based on this probabilistic NMF, the VAD is constructed using the likelihood ratio test assuming that speech and noise follow Poisson distributions. Experimental results show that the proposed approach outperformed the conventional Gaussian model-based and NMF-based methods at 0-15 dB signal-to-noise ratio simulation conditions.
Keywords
voice activity detection; NMF; Poisson distribution; likelihood ratio test;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett., vol. 6, no. 1, pp. 1-3, Jan. 1999.   DOI
2 J. -H. Chang, N. S. Kim, and S. K. Mitra, "Voice activity detection based on multiple statistical models," IEEE Trans. Sign. Process., vol. 54, no. 6, pp. 1965-1976, Jun. 2006.   DOI
3 Q. -H. Jo, J. -H. Chang, J. Shin, and N. S. Kim, "Statistical model-based voice activity detection using support vector machine," IET Sign. Process., vol. 3, no. 3, pp. 205-210, May 2009.   DOI
4 L. Zhang and J. Wu, "Deep belief networks based voice activity detection," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 4, pp. 3371-3408, Apr. 2013.
5 D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, vol. 401, pp. 788-791, Oct. 1999.   DOI
6 S. -I. Kang and J. -H. Chang, "Voice activity detection based on non-negative matrix factorization," J. KICS, vol. 35, no. 8, pp. 661-666, 2010.
7 F. G. Germain, D. L. Sun, and G. J. Mysore, "Speaker and noise independent voice activity detection," Interspeech, pp. 732-736, Aug. 2013.
8 A. T. Cemgil, "Bayesian inference for nonnegative matrix factorisation models," Computational Intelligence and Neuroscience, vol. 2009, no. 785152, p. 17, 2009.
9 T. Virtanen, A. T. Cemgil, and S. J. Godsill. "Bayesian extensions to non-negative matrix factorisation for audio signal modelling," in Proc. IEEE Int. Conf. Acoust. Speech and Sign. Process. 2008, pp. 1825-1828, Las Vegas, Apr. 2008.
10 N. Mohammadiha, T. Gerkmann, and A. Leijon, "A new linear MMSE filter for single channel speech enhancement based on nonnegative matrix factorization," IEEE WASPAA, pp. 45-48, 2011.
11 K. Kwon, Y. G. Jin, S. H. Bae, and N. S. Kim, "A NMF-based speech enhancement method using a prior time varying information and gain function," J. KICS, vol. 38C, no. 6, pp. 503-511, 2013.
12 ETSI EN 301708-1999: Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, v7.1.1 (European Telecommunications Standards Institute, France, 1999).