Browse > Article
http://dx.doi.org/10.5573/ieek.2013.50.10.181

A Study on Voice Activity Detection Using Auditory Scene and Periodic to Aperiodic Component Ratio in CASA System  

Kim, Jung-Ho (Department of Electronics and Communication Engineering, Kwangwoon University)
Ko, Hyung-Hwa (Department of Electronics and Communication Engineering, Kwangwoon University)
Kang, Chul-Ho (Department of Electronics and Communication Engineering, Kwangwoon University)
Publication Information
Journal of the Institute of Electronics and Information Engineers / v.50, no.10, 2013 , pp. 181-187 More about this Journal
Abstract
When there are background noises or some people speaking at the same time, a human's auditory sense has the ability to listen the target speech signal with a specific purpose through Auditory Scene Analysis. The CASA system with human's auditory faculty system is able to segregate the speech. However, the performance of CASA system is reduced when the CASA system fails to determine the correct position of the speech. In order to correct the error in locating the speech on the CASA system, voice activity detection algorithm is proposed in this paper, which is a combined auditory scene analysis with PAR(Periodic to Aperiodic component Ratio). The experiments have been conducted to evaluate the performance of voice activity detection in environments of white noise and car noise with the change of SNR 15~0dB. In this paper, by comparing the existing algorithms (Pitch and Guoning Hu) with the proposed algorithm, the accuracy of the voice activity detection performance has been improved as the following: improvement of maximum 4% at SNR 15dB and maximum 34% at SNR 0dB for white noise and car noise, respectively.
Keywords
Auditory Scene Analysis; CASA; VAD;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 A. S. Bregman, "Auditory Scene Analysis: The Perceptual Organization of Sound," Cambridge, MIT Press, 1990.
2 정상봉, 구자일, 홍준표, "웨이블렛 변환과 독립 성분 분석을 이용한 음성 블라인드 소스 분리에 대한 연구", 전자공학회논문지, 제40권 IE편, 제2호, 15-22 쪽, 2003년 6월   과학기술학회마을
3 DeLiang Wang and G. J. Brown, "Computational auditory scene analysis," Comput. Speech Lang., vol. 8, pp. 297-336, 1994.   DOI   ScienceOn
4 M. Fujimoto and K. Ishizuka, T. Nakatani and N. Miyazaki, "Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio," Proc. Interspeech 7, pp 230-233, 2007.
5 Naotoshi Seo, "Individual voice activity detection using periodic to aperiodic component ration based activity detection(PARADE) and Gaussian mixture speaker models," (http://note.sonots.com/SciSoftware/IVAD.html)
6 B. R. Glasberg and B. C. J. Moore, "Derivation of auditory filter shapes from notched-noise data," Hearing Research 47, pp. 103-138, 1990.   DOI   ScienceOn
7 Guoning Hu and Deliang Wang, "Auditory Segmentation Based on Onset and Offset Analysis," IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, pp 396-405, 2007.   DOI   ScienceOn
8 T. Nakatani and T. Irino, "Robust and accurate fundamental frequency estimation based on dominant harmonic components," J. Acoust. Soc. Am. 116, pp. 3690-3700, 2004.   DOI   ScienceOn
9 Jongseo Sohn, Nam Soo Kim, "A statistical model-based voice activity detection," IEEE Signal Process, vol. 6, pp. 1-3, 1999
10 M. Fujimoto and K. Ishizuka, and T. Nakatani, "A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme," ICASSP, pp. 4441-4444, 2008.