Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition

Shin, Min-Hwa;Park, Ji-Hun;Kim, Hong-Kook;Lee, Yeon-Woo;Lee, Seong-Ro;

말소리와 음성과학 (Phonetics and Speech Sciences)

제2권3호
/
Pages.141-148
/
2010
/
2005-8063(pISSN)
/
2586-5854(eISSN)

한국음성학회 (Korean Society of Speech Sciences)

이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출

Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition

신민화 (전자부품연구원) ;
박지훈 (광주과학기술원) ;
김홍국 (광주과학기술원) ;
이연우 (목포대학교) ;
이성로 (목포대학교)

투고 : 2010.08.01
심사 : 2010.09.28
발행 : 2010.09.30

PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

In this paper, voice activity detection (VAD) for dual-channel noisy speech recognition is proposed in which spatial cues are employed. In the proposed method, a probability model for speech presence/absence is constructed using spatial cues obtained from dual-channel input signal, and a speech activity interval is detected through this probability model. In particular, spatial cues are composed of interaural time differences and interaural level differences of dual-channel speech signals, and the probability model for speech presence/absence is based on a Gaussian kernel density. In order to evaluate the performance of the proposed VAD method, speech recognition is performed for speech segments that only include speech intervals detected by the proposed VAD method. The performance of the proposed method is compared with those of several methods such as an SNR-based method, a direction of arrival (DOA) based method, and a phase vector based method. It is shown from the speech recognition experiments that the proposed method outperforms conventional methods by providing relative word error rates reductions of 11.68%, 41.92%, and 10.15% compared with SNR-based, DOA-based, and phase vector based method, respectively.

말소리와 음성과학 (Phonetics and Speech Sciences)

이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출

Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition

초록

키워드

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)