Applying the Bi-level HMM for Robust Voice-activity Detection

Hwang, Yongwon;Jeong, Mun-Ho;Oh, Sang-Rok;Kim, Il-Hwan;

doi:10.5370/JEET.2017.12.1.373

Journal of Electrical Engineering and Technology

Volume 12 Issue 1
/
Pages.373-377
/
2017
/
1975-0102(pISSN)
/
2093-7423(eISSN)

The Korean Institute of Electrical Engineers (대한전기학회)

DOI QR Code

Applying the Bi-level HMM for Robust Voice-activity Detection

Hwang, Yongwon (Dept. of Electrical and Electronic Engineering, Yonsei University) ;
Jeong, Mun-Ho (School of Robotics, Kwangwoon University) ;
Oh, Sang-Rok (Center for Robotics Research, Korea Institute of Science and Technology) ;
Kim, Il-Hwan (Dept. of Electronic and Communication, Kwangwoon National University)

Received : 2016.01.16
Accepted : 2016.05.29
Published : 2017.01.02

https://doi.org/10.5370/JEET.2017.12.1.373 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper presents a voice-activity detection (VAD) method for sound sequences with various SNRs. For real-time VAD applications, it is inadequate to employ a post-processing for the removal of burst clippings from the VAD output decision. To tackle this problem, building on the bi-level hidden Markov model, for which a state layer is inserted into a typical hidden Markov model (HMM), we formulated a robust method for VAD not requiring any additional post-processing. In the method, a forward-inference-ratio test was devised to detect the speech endpoints and Mel-frequency cepstral coefficients (MFCC) were used as the features. Our experiment results show that, regarding different SNRs, the performance of the proposed approach is more outstanding than those of the conventional methods.

Keywords

References

Y. Zhang, Z. Tang and Y. Li, et al., "A hierarchical framework approach for voice activity detection and speech enhancement", The Scientific World J., Vol. 2014, pp. 1-8, 2014.
J. Choi, "Speech and Noise Recognition System by Neural Network", The J. of Korea Institute of Electronic Communication Science, Vol. 5, No. 4, pp. 357-362, 2010.
J. Choi, "Subband Based Spectrum Subtraction Algorithm", The J. of Korea Institute of Electronic Communication Science, Vol. 8, No. 4, pp. 555-560, 2013. https://doi.org/10.13067/JKIECS.2013.8.4.555
C. Lee and D. Kim, "Adaptive Noise Reduction of Speech Using Wavelet Transform", The J. of Korea Institute of Electronic Communication Science, Vol. 4, No. 3, pp. 190-196, 2009.
M. H. Moattar , M. M. Homayounpour, "A Simple but Efficient Real-Time Voice Activity Detection Algorithm", European Signal Processing Conference, pp. 2549-2553, 2009.
R. V. Prasad, A. Sangwan, H. S. Jamadagni, Chiranth M. C, Rahul Sah, Vishal Gaurav, "Comparison of Voice Activity Detection Algorithms for VoIP", Proc. of the 7th International Symposium on Computers and Communications , pp. 1530-1346, 2002.
J. Ramirez, J. C. Segura, C. Benitez, L. Garcia, and A. Rubio, "Statistical Voice Detection using a Multiple Observation Likelihood Ratio Test", IEEE Signal Proc. Letters, Vol. 12, No. 10, pp. 689-692, 2005. https://doi.org/10.1109/LSP.2005.855551
Joon-Hyuk Chang , Nam Soo Kim, S. K. Mitra, "Voice activity detection based on multiple statistical models", IEEE Trans. on Signal Processing, Vol. 54, Issue 6, 1965-1976, 2006. https://doi.org/10.1109/TSP.2006.874403
Ji Wu, Xiao-Lei Zhang, "An efficient voice activity detection algorithm by combining statistical model and energy detection", EURASIP Journal on Advances in Signal Processing, Vol. 2011, No. 18, 2011.
H. Veisi and H. Sameti, "Hidden Markov Modelbased Voice Activity Detector with High Speech Detection Rate for Speech Enhancement", IET Signal Proc., Vol. 6, No. 1, pp. 54-63, 2012. https://doi.org/10.1049/iet-spr.2010.0282
H. Othman and T. Aboulnasr, "A Semi-Continuous State-Transition Probability HMM-Based Voice Activity Detector", EURASIP J. on Audio, Speech, and Music Proc., Vol. 2007, pp. 1-7, 2007.
Yuan Liang, Xianglong Liu1, Yihua Lou, Baosong Shan, "An improved noise-robust voice activity detector based on hidden semi-Markov models", Pattern Recognition Letters, Vol. 32, pp. 1044-1053, 2011. https://doi.org/10.1016/j.patrec.2011.02.015
Xulei Bao, Jie Zhu, "A Novel Voice Activity Detection based on Phoneme Recognition Statistical Model", EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2012, No. 1(doi:10.1186/1687-4722-2012-1), 2012.
D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker Verification using Adapted Gaussian Mixture Models", Digital Signal Processing, Vol. 10, pp. 19-41, 2000. https://doi.org/10.1006/dspr.1999.0361
S. Chen, R. C. Guido, T. Truong, and Y. Chang, "Improved Voice Activity Detection Algorithm using Wavelet and Support Vector Machine", Computer Speech and Language, Vol. 24, No. 3, pp. 531-543, 2010. https://doi.org/10.1016/j.csl.2009.06.002
J. Sohn, N.-S. Kim, and W. Sung, "A statistical model-based voice activity detection", IEEE Signal Proc. Letters, Vol. 6, No. 1, pp. 1-3, 1999. https://doi.org/10.1109/97.736233
P. Tiawongsombat, Mun-Ho Jeong, J. Yun, B. You, and S. Oh, "Robust visual speakingness detection using bi-level HMM", Pattern Recognition, Vol. 45, No. 2, pp. 783-793, 2012. https://doi.org/10.1016/j.patcog.2011.07.011
H. Wang, Y. Xu, and M. Li, "Study on the MFCC similarity-based voice activity detection algorithm, Int. Conf. on Artificial Intelligence", Management Science and Electronic Commerce(AIMSEC), Deng Leng, pp. 4391-4394, August 2011.
S. Skorik and F. Berthommier, "On a cepstrum-based speech detector robust to white noise", Computing Research Repository, Vol. cs.CL/00100014, pp. 1-4, 2000.
NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms. Available online: http://ecs.utdallas.edu/loizou/speech/noizeus.

Journal of Electrical Engineering and Technology

Applying the Bi-level HMM for Robust Voice-activity Detection

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)