Statistical Voice Activity Detection Using Probabilistic Non-Negative Matrix Factorization

Kim, Dong Kook;Shin, Jong Won;Kwon, Kisoo;Kim, Nam Soo;

doi:10.7840/kics.2016.41.8.851

The Journal of Korean Institute of Communications and Information Sciences (한국통신학회논문지)

Volume 41 Issue 8
/
Pages.851-858
/
2016
/
1226-4717(pISSN)
/
2287-3880(eISSN)

The Korean Institute of Commucations and Information Sciences (한국통신학회)

DOI QR Code

Statistical Voice Activity Detection Using Probabilistic Non-Negative Matrix Factorization

확률적 비음수 행렬 인수분해를 사용한 통계적 음성검출기법

Kim, Dong Kook (Chonnam National University, School of Electronic and Computer Engineering) ;
Shin, Jong Won (Gwangju Institute of Science and Technology, School of Electrical Engineering and Computer Science) ;
Kwon, Kisoo (Seoul National University, Department of Electrical and Computer Engineering and the Institute of New Media and Communications) ;
Kim, Nam Soo (Seoul National University, Department of Electrical and Computer Engineering and the Institute of New Media and Communications)

Received : 2016.05.12
Accepted : 2016.07.12
Published : 2016.08.31

https://doi.org/10.7840/kics.2016.41.8.851 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper presents a new statistical voice activity detection (VAD) based on the probabilistic interpretation of nonnegative matrix factorization (NMF). The objective function of the NMF using Kullback-Leibler divergence coincides with the negative log likelihood function of the data if the distribution of the data given the basis and encoding matrices is modeled as Poisson distributions. Based on this probabilistic NMF, the VAD is constructed using the likelihood ratio test assuming that speech and noise follow Poisson distributions. Experimental results show that the proposed approach outperformed the conventional Gaussian model-based and NMF-based methods at 0-15 dB signal-to-noise ratio simulation conditions.

본 논문은 비음수 행렬 인수분해(NMF)의 확률적 해석에 근거한 새로운 통계적 음성검출기법을 제안한다. NMF의 기저와 부호화 행렬들이 주어졌을 때, 데이터 행렬의 분포를 Poisson 분포로 가정한 로그 우도는 Kullback-Leibler 발산을 이용한 NMF의 목적 함수와 일치한다. 이러한 NMF의 확률모델에 근거하여 음성검출을 위해 DFT영역에서 잡음과 음성의 크기 스펙트럼을 Poisson 분포로 모델링하여 새로운 우도비 검출 규칙을 유도한다. 실험 결과를 통해 제안된 기법이 0-15dB 신호 대 잡음비의 시뮬레이션 환경에서 기존 Gaussian과 NMF을 사용한 기법보다 향상된 음성검출 결과를 보여준다.

Keywords

References

J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett., vol. 6, no. 1, pp. 1-3, Jan. 1999. https://doi.org/10.1109/97.736233
J. -H. Chang, N. S. Kim, and S. K. Mitra, "Voice activity detection based on multiple statistical models," IEEE Trans. Sign. Process., vol. 54, no. 6, pp. 1965-1976, Jun. 2006. https://doi.org/10.1109/TSP.2006.874403
Q. -H. Jo, J. -H. Chang, J. Shin, and N. S. Kim, "Statistical model-based voice activity detection using support vector machine," IET Sign. Process., vol. 3, no. 3, pp. 205-210, May 2009. https://doi.org/10.1049/iet-spr.2008.0128
L. Zhang and J. Wu, "Deep belief networks based voice activity detection," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 4, pp. 3371-3408, Apr. 2013.
D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, vol. 401, pp. 788-791, Oct. 1999. https://doi.org/10.1038/44565
S. -I. Kang and J. -H. Chang, "Voice activity detection based on non-negative matrix factorization," J. KICS, vol. 35, no. 8, pp. 661-666, 2010.
F. G. Germain, D. L. Sun, and G. J. Mysore, "Speaker and noise independent voice activity detection," Interspeech, pp. 732-736, Aug. 2013.
A. T. Cemgil, "Bayesian inference for nonnegative matrix factorisation models," Computational Intelligence and Neuroscience, vol. 2009, no. 785152, p. 17, 2009.
T. Virtanen, A. T. Cemgil, and S. J. Godsill. "Bayesian extensions to non-negative matrix factorisation for audio signal modelling," in Proc. IEEE Int. Conf. Acoust. Speech and Sign. Process. 2008, pp. 1825-1828, Las Vegas, Apr. 2008.
N. Mohammadiha, T. Gerkmann, and A. Leijon, "A new linear MMSE filter for single channel speech enhancement based on nonnegative matrix factorization," IEEE WASPAA, pp. 45-48, 2011.
K. Kwon, Y. G. Jin, S. H. Bae, and N. S. Kim, "A NMF-based speech enhancement method using a prior time varying information and gain function," J. KICS, vol. 38C, no. 6, pp. 503-511, 2013.
ETSI EN 301708-1999: Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, v7.1.1 (European Telecommunications Standards Institute, France, 1999).

The Journal of Korean Institute of Communications and Information Sciences (한국통신학회논문지)

Statistical Voice Activity Detection Using Probabilistic Non-Negative Matrix Factorization

확률적 비음수 행렬 인수분해를 사용한 통계적 음성검출기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)