A Study on Voice Activity Detection Using Auditory Scene and Periodic to Aperiodic Component Ratio in CASA System

Kim, Jung-Ho;Ko, Hyung-Hwa;Kang, Chul-Ho;

doi:10.5573/ieek.2013.50.10.181

Journal of the Institute of Electronics and Information Engineers (전자공학회논문지)

Volume 50 Issue 10
/
Pages.181-187
/
2013
/
2287-5026(pISSN)
/
2288-159X(eISSN)

The Institute of Electronics and Information Engineers (대한전자공학회)

DOI QR Code

A Study on Voice Activity Detection Using Auditory Scene and Periodic to Aperiodic Component Ratio in CASA System

CASA 시스템의 청각장면과 PAR를 이용한 음성 영역 검출에 관한 연구

Kim, Jung-Ho (Department of Electronics and Communication Engineering, Kwangwoon University) ;
Ko, Hyung-Hwa (Department of Electronics and Communication Engineering, Kwangwoon University) ;
Kang, Chul-Ho (Department of Electronics and Communication Engineering, Kwangwoon University)

김정호 (광운대학교 전자통신공학과) ;
고형화 (광운대학교 전자통신공학과) ;
강철호 (광운대학교 전자통신공학과)

Received : 2013.07.09
Published : 2013.10.25

https://doi.org/10.5573/ieek.2013.50.10.181 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

When there are background noises or some people speaking at the same time, a human's auditory sense has the ability to listen the target speech signal with a specific purpose through Auditory Scene Analysis. The CASA system with human's auditory faculty system is able to segregate the speech. However, the performance of CASA system is reduced when the CASA system fails to determine the correct position of the speech. In order to correct the error in locating the speech on the CASA system, voice activity detection algorithm is proposed in this paper, which is a combined auditory scene analysis with PAR(Periodic to Aperiodic component Ratio). The experiments have been conducted to evaluate the performance of voice activity detection in environments of white noise and car noise with the change of SNR 15~0dB. In this paper, by comparing the existing algorithms (Pitch and Guoning Hu) with the proposed algorithm, the accuracy of the voice activity detection performance has been improved as the following: improvement of maximum 4% at SNR 15dB and maximum 34% at SNR 0dB for white noise and car noise, respectively.

인간의 청각은 청각 장면 분석을 통해 배경 잡음이나 여러 사람들이 동시에 말하는 상황에서도 특정 목적을 가지는 음성 신호를 청취할 수 있는 능력을 가지고 있다. 인간의 청각 능력 시스템을 잘 반영한 CASA 시스템을 이용해 음성을 분리를 할 수 있다. 그러나 CASA 세그먼트에서 음성의 위치를 잘못 결정 했을 때 CASA 시스템의 성능은 감소된다. 본 논문에서는 CASA 시스템에서 잘못된 음성 영역 위치로 인해 발생되는 성능 감소를 개선하기 위하여 청각 장면, 그리고 주기 성분과 비주기 성분의 비율(PAR)을 결합한 음성 영역 검출 알고리즘을 제안한다. 음성 영역 검출의 성능을 평가하기 위하여 백색 잡음과 자동차 잡음 환경에서 신호 대 잡음비의 변화에 따라 실험을 수행하였다. 본 논문에서는 신호 대 잡음비 15~0dB에서 기존의 알고리즘(Pitch 와 Guoning Hu)과 제안한 알고리즘을 비교한 결과, 음성 영역 검출의 정확도가 백색잡음과 자동차 잡음에서 신호 대 잡음비 15dB 에서 최대 4%, 0dB에서 최대 34% 씩 각각 향상되었다.

Keywords

References

A. S. Bregman, "Auditory Scene Analysis: The Perceptual Organization of Sound," Cambridge, MIT Press, 1990.
정상봉, 구자일, 홍준표, "웨이블렛 변환과 독립 성분 분석을 이용한 음성 블라인드 소스 분리에 대한 연구", 전자공학회논문지, 제40권 IE편, 제2호, 15-22 쪽, 2003년 6월
DeLiang Wang and G. J. Brown, "Computational auditory scene analysis," Comput. Speech Lang., vol. 8, pp. 297-336, 1994. https://doi.org/10.1006/csla.1994.1016
M. Fujimoto and K. Ishizuka, T. Nakatani and N. Miyazaki, "Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio," Proc. Interspeech 7, pp 230-233, 2007.
Naotoshi Seo, "Individual voice activity detection using periodic to aperiodic component ration based activity detection(PARADE) and Gaussian mixture speaker models," (http://note.sonots.com/SciSoftware/IVAD.html)
B. R. Glasberg and B. C. J. Moore, "Derivation of auditory filter shapes from notched-noise data," Hearing Research 47, pp. 103-138, 1990. https://doi.org/10.1016/0378-5955(90)90170-T
Guoning Hu and Deliang Wang, "Auditory Segmentation Based on Onset and Offset Analysis," IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, pp 396-405, 2007. https://doi.org/10.1109/TASL.2006.881700
T. Nakatani and T. Irino, "Robust and accurate fundamental frequency estimation based on dominant harmonic components," J. Acoust. Soc. Am. 116, pp. 3690-3700, 2004. https://doi.org/10.1121/1.1787522
Jongseo Sohn, Nam Soo Kim, "A statistical model-based voice activity detection," IEEE Signal Process, vol. 6, pp. 1-3, 1999
M. Fujimoto and K. Ishizuka, and T. Nakatani, "A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme," ICASSP, pp. 4441-4444, 2008.

Cited by

Speech Segmentation using Weighted Cross-correlation in CASA System vol.51, pp.5, 2014, https://doi.org/10.5573/ieie.2014.51.5.188

Journal of the Institute of Electronics and Information Engineers (전자공학회논문지)

A Study on Voice Activity Detection Using Auditory Scene and Periodic to Aperiodic Component Ratio in CASA System

CASA 시스템의 청각장면과 PAR를 이용한 음성 영역 검출에 관한 연구

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)