DOI QR코드

DOI QR Code

Applying the Bi-level HMM for Robust Voice-activity Detection

  • Hwang, Yongwon (Dept. of Electrical and Electronic Engineering, Yonsei University) ;
  • Jeong, Mun-Ho (School of Robotics, Kwangwoon University) ;
  • Oh, Sang-Rok (Center for Robotics Research, Korea Institute of Science and Technology) ;
  • Kim, Il-Hwan (Dept. of Electronic and Communication, Kwangwoon National University)
  • 투고 : 2016.01.16
  • 심사 : 2016.05.29
  • 발행 : 2017.01.02

초록

This paper presents a voice-activity detection (VAD) method for sound sequences with various SNRs. For real-time VAD applications, it is inadequate to employ a post-processing for the removal of burst clippings from the VAD output decision. To tackle this problem, building on the bi-level hidden Markov model, for which a state layer is inserted into a typical hidden Markov model (HMM), we formulated a robust method for VAD not requiring any additional post-processing. In the method, a forward-inference-ratio test was devised to detect the speech endpoints and Mel-frequency cepstral coefficients (MFCC) were used as the features. Our experiment results show that, regarding different SNRs, the performance of the proposed approach is more outstanding than those of the conventional methods.

키워드

참고문헌

  1. Y. Zhang, Z. Tang and Y. Li, et al., "A hierarchical framework approach for voice activity detection and speech enhancement", The Scientific World J., Vol. 2014, pp. 1-8, 2014.
  2. J. Choi, "Speech and Noise Recognition System by Neural Network", The J. of Korea Institute of Electronic Communication Science, Vol. 5, No. 4, pp. 357-362, 2010.
  3. J. Choi, "Subband Based Spectrum Subtraction Algorithm", The J. of Korea Institute of Electronic Communication Science, Vol. 8, No. 4, pp. 555-560, 2013. https://doi.org/10.13067/JKIECS.2013.8.4.555
  4. C. Lee and D. Kim, "Adaptive Noise Reduction of Speech Using Wavelet Transform", The J. of Korea Institute of Electronic Communication Science, Vol. 4, No. 3, pp. 190-196, 2009.
  5. M. H. Moattar , M. M. Homayounpour, "A Simple but Efficient Real-Time Voice Activity Detection Algorithm", European Signal Processing Conference, pp. 2549-2553, 2009.
  6. R. V. Prasad, A. Sangwan, H. S. Jamadagni, Chiranth M. C, Rahul Sah, Vishal Gaurav, "Comparison of Voice Activity Detection Algorithms for VoIP", Proc. of the 7th International Symposium on Computers and Communications , pp. 1530-1346, 2002.
  7. J. Ramirez, J. C. Segura, C. Benitez, L. Garcia, and A. Rubio, "Statistical Voice Detection using a Multiple Observation Likelihood Ratio Test", IEEE Signal Proc. Letters, Vol. 12, No. 10, pp. 689-692, 2005. https://doi.org/10.1109/LSP.2005.855551
  8. Joon-Hyuk Chang , Nam Soo Kim, S. K. Mitra, "Voice activity detection based on multiple statistical models", IEEE Trans. on Signal Processing, Vol. 54, Issue 6, 1965-1976, 2006. https://doi.org/10.1109/TSP.2006.874403
  9. Ji Wu, Xiao-Lei Zhang, "An efficient voice activity detection algorithm by combining statistical model and energy detection", EURASIP Journal on Advances in Signal Processing, Vol. 2011, No. 18, 2011.
  10. H. Veisi and H. Sameti, "Hidden Markov Modelbased Voice Activity Detector with High Speech Detection Rate for Speech Enhancement", IET Signal Proc., Vol. 6, No. 1, pp. 54-63, 2012. https://doi.org/10.1049/iet-spr.2010.0282
  11. H. Othman and T. Aboulnasr, "A Semi-Continuous State-Transition Probability HMM-Based Voice Activity Detector", EURASIP J. on Audio, Speech, and Music Proc., Vol. 2007, pp. 1-7, 2007.
  12. Yuan Liang, Xianglong Liu1, Yihua Lou, Baosong Shan, "An improved noise-robust voice activity detector based on hidden semi-Markov models", Pattern Recognition Letters, Vol. 32, pp. 1044-1053, 2011. https://doi.org/10.1016/j.patrec.2011.02.015
  13. Xulei Bao, Jie Zhu, "A Novel Voice Activity Detection based on Phoneme Recognition Statistical Model", EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2012, No. 1(doi:10.1186/1687-4722-2012-1), 2012.
  14. D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker Verification using Adapted Gaussian Mixture Models", Digital Signal Processing, Vol. 10, pp. 19-41, 2000. https://doi.org/10.1006/dspr.1999.0361
  15. S. Chen, R. C. Guido, T. Truong, and Y. Chang, "Improved Voice Activity Detection Algorithm using Wavelet and Support Vector Machine", Computer Speech and Language, Vol. 24, No. 3, pp. 531-543, 2010. https://doi.org/10.1016/j.csl.2009.06.002
  16. J. Sohn, N.-S. Kim, and W. Sung, "A statistical model-based voice activity detection", IEEE Signal Proc. Letters, Vol. 6, No. 1, pp. 1-3, 1999. https://doi.org/10.1109/97.736233
  17. P. Tiawongsombat, Mun-Ho Jeong, J. Yun, B. You, and S. Oh, "Robust visual speakingness detection using bi-level HMM", Pattern Recognition, Vol. 45, No. 2, pp. 783-793, 2012. https://doi.org/10.1016/j.patcog.2011.07.011
  18. H. Wang, Y. Xu, and M. Li, "Study on the MFCC similarity-based voice activity detection algorithm, Int. Conf. on Artificial Intelligence", Management Science and Electronic Commerce(AIMSEC), Deng Leng, pp. 4391-4394, August 2011.
  19. S. Skorik and F. Berthommier, "On a cepstrum-based speech detector robust to white noise", Computing Research Repository, Vol. cs.CL/00100014, pp. 1-4, 2000.
  20. NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms. Available online: http://ecs.utdallas.edu/loizou/speech/noizeus.