Voice Activity Detection Based on Non-negative Matrix Factorization

비음수 행렬 인수분해 기반의 음성검출 알고리즘

  • 강상익 (인하대학교 전자공학과 DSP연구실) ;
  • 장준혁 (인하대학교 전자공학과)
  • Received : 2010.05.14
  • Accepted : 2010.07.13
  • Published : 2010.08.31

Abstract

In this paper, we apply a likelihood ratio test (LRT) to a non-negative matrix factorization (NMF) based voice activity detection (VAD) to find optimal threshold. In our approach, the NMF based VAD is expressed as Euclidean distance between noise basis vector and input basis vector which are extracted through NMF. The optimal threshold each of noise environments depend on NMF results distribution in noise region which is estimated statistical model-based VAD. According to the experimental results, the proposed approach is found to be effective for statistical model-based VAD using LRT.

본 논문에서는 비음수 행렬 인수분해 기법을 기반으로 한 새로운 음성 검출 (Voice Activity Detection, VAD) 알고리즘을 제안한다. 먼저, 기존의 통계모델기반의 음성검출기를 분석하고, 이를 기반으로 비음수 행렬 인수분해를 통해 도출한 입력 기초 벡터와 잡음 기초 벡터 차이로 음성의 유무를 판단한다. 이때 최적의 문턱값을 찾기 위해 통계모델 기반의 음성검출기에 의해 추정된 잡음 구간에서 NMF 결과의 분포에 따라 최적화된 문턱값을 비음수 행렬기반의 음성 검출 알고리즘에 적용하는 방법을 제안한다. 실험 결과 기존의 통계적 모델 기반의 음성검출기에 비해 6.75%의 성능향상을 가져왔다.

Keywords

References

  1. Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoustics, Speech, Sig. Process., VoI.ASSP-32, No.6, pp.1190-1121, Dec. 1984.
  2. J. Sohn and W. Sung, "A voice activity detector employing soft decision based noise spectrum adaptation," Proc. Int. Conf Acoustics, Speech, and Sig. Process., Vol.1, pp. 365-368, May 1998.
  3. J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Sig. Process. Lett., Vol.6, No.1, pp.1-3, Jan. 1999. https://doi.org/10.1109/97.736233
  4. Y. D. Cho and A. Kondoz, "Analysis and improvement of a statistical model-based voice activity detector," IEEE Sig. Process. Lett., Vol.8, No. 10, pp.276-278, Oct. 2001. https://doi.org/10.1109/97.957270
  5. J. -H. Chang, J. W. Shin, and N. S. Kim, "Voice activity detector employing generalised Gaussian distribution," Electron. Lett., Vol.40, No.24, pp.1561-1563, Nov. 2004. https://doi.org/10.1049/el:20047090
  6. J. -H. Chang, N. S. Kim, and S. K. Mitra, "Voice activity detection based on multiple statistical models," IEEE Trans. Sig. Process., Vol.54, No.6, pp.1965-1976, June 2006. https://doi.org/10.1109/TSP.2006.874403
  7. Y. C. Lee and S. S. Ahn, " Statistical model-based VAD algorithm with wavelet Transform," IEICE Trans. Fundamentals, VoI.E89-A, No.6, pp.1594-1600, June 2006. https://doi.org/10.1093/ietfec/e89-a.6.1594
  8. J. Ramirez, J. M. Gorriz, J. C. Segura, C. G. Puntonet, and A. J. Rubio, "Speech / nonspeech discrimination based on contextual information integrated bispectrum LRT," IEEE Sig. Process. Lett., Vol.13, No.8, pp.497-500, Aug. 2006. https://doi.org/10.1109/LSP.2006.873147
  9. D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, Vol.401, pp.788-791, Oct. 1999. https://doi.org/10.1038/44565
  10. D. D. Lee and H. S. Seung, "Algorithms for Non-negative Matrix Factorization," In Advances in Neural Information Processing Systems, Vol.13, pp.556 - 562, 2001.
  11. A. Varga and H. J. M. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Communication, Vol.12, No.3, pp.247-251, 1993. https://doi.org/10.1016/0167-6393(93)90095-3