DOI QR코드

DOI QR Code

Voice Activity Detection Algorithm using Wavelet Band Entropy Ensemble Analysis in Car Noisy Environments

자동차 잡음 환경에서 웨이브렛 밴드 엔트로피 앙상블 분석을 이용한 음성구간 검출 알고리즘

  • 이기현 (경북대학교 대학원 의용생체공학과) ;
  • 이윤정 (경북대학교 대학원 의용생체공학과) ;
  • 김명남 (경북대학교 의학전문대학원 의공학교실)
  • Received : 2013.06.19
  • Accepted : 2013.08.12
  • Published : 2013.09.30

Abstract

Voice activity detection is very important process that voice activity separated form noisy speech signal for speech enhance. Over the past few years, many studies have been made on voice activity detection, but it has poor performance in low signal to noise ratio environment or fickle noise such as car noise. In this paper, it proposed new voice activity detection algorithm using ensemble variance based on wavelet band entropy and soft thresholding method. We conduct a survey in a lot of signal to noise ratio environment of car noise to evaluate performance of the proposed algorithm and confirmed performance of the proposed algorithm.

음성구간 검출은 음성과 잡음이 섞인 신호에서 음성구간과 비음성구간을 구분하는 과정으로 음성 향상을 위한 신호처리에서 매우 중요한 과정이다. 지금까지 음성구간 검출에 관한 많은 연구가 있었지만, 낮은 신호 대 잡음비 환경이나 자동차 잡음과 같은 시간에 따른 변화가 심한 잡음환경에서는 좋은 성능을 보이지 못하였다. 본 논문에서는 웨이브렛 밴드 엔트로피 기반의 앙상블 분산과 소프트 문턱치 기법을 이용한 새로운 음성구간 검출 알고리듬을 제안하였다. 제안한 알고리듬의 성능을 비교 평가하기 위하여 자동차 잡음이 있는 다양한 신호 대 잡음비 환경에서 실험을 수행하였으며 실험결과, 제안한 방법의 우수한 성능을 확인할 수 있었다.

Keywords

References

  1. L. Rabiner and B.H. Juang, Fundmentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993.
  2. D.G. Ha, S.J. Cho, G.G. Jin, and O.K. Shin, "Voice Activity Detection Based on Signal Energy and Entropy-difference in Noisy Environments," Journal of the Korean Society of Marine Engineering, Vol. 32, No. 5, pp. 768-774, 2008. https://doi.org/10.5916/jkosme.2008.32.5.768
  3. J. Ramiirez, J.C. Segura, C. Beniitez, A. de la- Torre, and A. Rubio, "An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition," IEEE Trans. on Speech and Audio Processing, Vol. 13, No. 6, pp. 1119-1129, 2005. https://doi.org/10.1109/TSA.2005.853212
  4. R. Gemello, F. Mana, and R. De Mori, "A Modified Ephraim-Malah Noise Suppression Rule for Automatic Speech Recognition," Proc. ICASSP 2004, Vol. 1, pp. 957-960, 2004.
  5. P. Teng and Y. Jia "Voice Activity Detection Via Noise Reducing using Non-Negative Sparse Coding," IEEE Signal Processing Letters, Vol. 20, Issue 5, pp. 475-478, 2013. https://doi.org/10.1109/LSP.2013.2252615
  6. Shi-Wen Deng and Ji-Qing Han, "Statistical Voice Activity Detection Based on Sparse Representation Over Learned Dictionary," Digital Signal Processing, Vol. 23, Issue 4, pp. 1228- 1232, 2013. https://doi.org/10.1016/j.dsp.2013.03.005
  7. M. Asgari, A. Sayadian, M. Farhadloo, and E.A. Mehrizi, "Voice Activity Detection using Entropy in Spectrum Domain," Telecommunication Networks and Applications Conference, pp. 407-410, 2008.
  8. C.E. Shannon, "A Mathematical Theory of Communication," Bell System Technical Journal, Vol. 27, pp. 379-423, 1948. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
  9. J. Ramirez, J.C. Segura, C.Benitez, L. Garcia, and A. Rubio, "Statistical Voice Activity Detection using a Multiple Observation Likelihood Ratio Test," IEEE Signal Processing Letter , Vol. 12, No. 10, pp. 689-692, 2005. https://doi.org/10.1109/LSP.2005.855551
  10. H.K. Kim, S.W. Lee, and J.K. Hong, "Noise Reduction using Spectral Subtraction in the Discrete Wavelet Transform Domain," Journal of the Korea Multimedia Society, Vol. 4, No. 4, pp. 306-315, 2001.
  11. J.I. Agbinya, "Discrete Wavelet Transform Techniques in Speech Processing," IEEE TENCON. Digital Signal Processing Applications, Vol. 2, pp. 514-519, 1996. https://doi.org/10.1109/TENCON.1996.608394
  12. S.H. Lee and D.H. Yoon, "EEG Signal Compression by Multi-scale Wavelets and Coherence Analysis and Denoising by Continuous Wavelets Transform," Journal of the Institute of Electronics Engineers of Korea, Vol. 41-SP, No. 3, pp. 221-229, 2004.
  13. S. Mallat and S. Zhong, "Caracterization of Signals from Multiscale Edges," IEEE Trans. on Information Theory, Vol. 38, No. 2, pp. 710- 732, 1992.
  14. K.S. Bae, "Detecttion of Glottal Closure Instant for Voice Speech using Wavelet Transform," Speech Sciences, Vol. 7, No. 3, pp. 164-176, 2000.
  15. G.H. Lee, P.U. Kim, Y.J. Lee, and M.N. Kim, "Detection of the First and Second Heart Sound using Three-order Shannon Energy Difference," Journal of the Korea Multimedia Society, Vol. 14, No. 7, pp. 884-894, 2011. https://doi.org/10.9717/kmms.2011.14.7.884

Cited by

  1. 조정 응답 파워 방법과 결합된 generalized cross correlation with phase transform 음원 위치 추정 vol.36, pp.5, 2013, https://doi.org/10.7776/ask.2017.36.5.345