Browse > Article
http://dx.doi.org/10.7776/ASK.2009.28.5.447

Voice Activity Detection Method Using Psycho-Acoustic Model Based on Speech Energy Maximization in Noisy Environments  

Choi, Gab-Keun (광운대학교 대학원 컴퓨터공학과)
Kim, Soon-Hyob (광운대학교 대학원 컴퓨터공학과)
Abstract
This paper introduces the method for detect voices and exact end point at low SNR by maximizing voice energy. Conventional VAD (Voice Activity Detection) algorithm estimates noise level so it tends to detect the end point inaccurately. Moreover, because it uses relatively long analysis range for reflecting temporal change of noise, computing load too high for application. In this paper, the SEM-VAD (Speech Energy Maximization-Voice Activity Detection) method which uses psycho-acoustical bark scale filter banks to maximize voice energy within frames is introduced. Stable threshold values are obtained at various noise environments (SNR 15 dB, 10 dB, 5 dB, 0 dB). At the test for voice detection in car noisy environment, PHR (Pause Hit Rate) was 100%accurate at every noise environment, and FAR (False Alarm Rate) shows 0% at SNR15 dB and 10 dB, 5.6% at SNR5 dB and 9.5% at SNR0 dB.
Keywords
Voice Activity Detection; Speech Recognition;
Citations & Related Records
연도 인용수 순위
  • Reference
1 E. Kosmides and E. Dermatas and G. Kokkinakis, "Stochastic endpoint detection in noisy speech", SPECOM Workshop, pp. 109-114, May. 1997
2 E. Zwicker and H. FastI, Psycho-acoustics Facts and Models, Springer-Verlag, Berlin, 1990
3 David Kozel and Constantin Apostoaia, “Colored Noise Re-duction Using Bark Scale Spectral Subtraction, Statistics, and Multiple Time Frames”, in Proc. IEEE International Con-ference Electro/lnformation Technology, pp. 416-421. May, 2007   DOI
4 Fletcher, “Auditory Patterns” Re. Mod. Phys., Vol. 12, pp. 47-65, Jan. 1940   DOI
5 Rabiner, L. R. and M. R. Sambur, "An Algorithm for De-termining the Endpoints of Isolated Utterances", The Bell System Technical Journal, Vol. 54, No. 2, pp. 297-315, 1975   ScienceOn
6 Tuske, Zoltan and Mihajlik, Peter and Tobler, Zoltan and Fegyo, Tibor "Robust voice activity detection based on the entropy of noise-suppressed spectrum", in Proc. of INTER-SPEECH, pp. 245-248, Sep. 2005
7 ETSI standard doc, ETSI ES 202 050 v1.1.1
8 김득수, "분산을 이용한 피치 및 유성음 구간 검출", 정보과학회논문지, 1권, 1호, 40 - 44쪽, 2004   과학기술학회마을
9 S. Rangachari and P.C. Loizou “A noise-estimation algo-rithm for highly non-stationary errvironments”, Speech Communi-cation, vol 48, no 2, pp. 220 - .231. 2006   DOI   ScienceOn
10 P. Renevey and A. Drygajlo. "Entropy based voice activity detection in very noisy conditions', in Proc. Eurospeech. pp. 1887-1890. Sep. 2001
11 University of Texas Dalla Speech Copus NOIZEUS, http://www.utdallas.edu/~loizou/speech/noizeus/, 2007