Voice Activity Detection Based on SVM Classifier Using Likelihood Ratio Feature Vector

Jo, Q-Haing;Kang, Sang-Ki;Chang, Joon-Hyuk;

doi:10.7776/ASK.2007.26.8.397

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 26 Issue 8
/
Pages.397-402
/
2007
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

DOI QR Code

Voice Activity Detection Based on SVM Classifier Using Likelihood Ratio Feature Vector

우도비 특징 벡터를 이용한 SVM 기반의 음성 검출기

조규행 (인하대학교 전자전기공학부) ;
강상기 (삼성전자 정보통신총괄 통신연구소) ;
장준혁 (인하대학교 전자전기공학부)

Published : 2007.11.30

https://doi.org/10.7776/ASK.2007.26.8.397 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we apply a support vector machine(SVM) that incorporates an optimized nonlinear decision rule over different sets of feature vectors to improve the performance of statistical model-based voice activity detection(VAD). Conventional method performs VAD through setting up statistical models for each case of speech absence and presence assumption and comparing the geometric mean of the likelihood ratio (LR) for the individual frequency band extracted from input signal with the given threshold. We propose a novel VAD technique based on SVM by treating the LRs computed in each frequency bin as the elements of feature vector to minimize classification error probability instead of the conventional decision rule using geometric mean. As a result of experiments, the performance of SVM-based VAD using the proposed feature has shown better results compared with those of reported VADs in various noise environments.

본 논문에서는 기존의 통계적 모델 기반의 음성 검출기의 성능 향상을 위해 이진 분류에 우수한 support vector machine(SVM)을 도입한다. 기존의 통계적 모델 기반 음성 검출기의 경우 음성의 존재와 부재에 대한 가설로부터 각각의 통계적 모델을 세워 입력 데이타에 의해 결정된 각 주파수 채널별 우도비(likelihood ratio)를 단순히 기하 평균을 취하여 문턱값과 비교, 음성 검출 여부를 판단한다. 제안된 음성 검출기는 기존의 기하 평균을 이용한 결정식을 대신하여 분류 오류 확률이 최소화되도록 각 주파수 채널별 우도비를 SVM의 특징 벡터로 적용한다. 제안된 SVM 기반의 통계적 모델 음성 검출기는 기존의 LRT를 이용한 음성 검출기 및 SVM 기반의 음성 검출기들과 비교하여 다양한 잡음 환경에서 우수한 성능을 나타낸다.

Keywords

References

K. Srinivasant and Allen Gersho, 'Voice activity detection for cellular networks,' Proc. IEEE Speech Coding Workshop, 85-86, Oct. 1993
ITU, 'A silence compression scheme for G.729 optimized for terminals conforming to ITU-T V.70,', ITU-T Rec. G. 729, Annex S, 1996
Y. Ephraim and D. Malah, 'Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,' IEEE Trans. Acoustics, Speech, Sig. Process., ASSP-32 (6), 1190-1121, Dec. 1984
J. Sohn and W. Sung, 'A voice activity detector employing soft decision based noise spectrum adaptation,' Proc. Int. Conf. Acoustics, Speech, and Sig. Process., 1, 365-368, May 1998
J. Sohn, N. S. Kim, and W. Sung, 'A statistical model-based voice activity detection,' IEEE Sig. Process. Lett., 6 (1), 1-3, Jan. 1999
Y. D. Cho and A. Kondoz, 'Analysis and improvement of a statistical model-based voice activity detector,' IEEE Sig. Process. Lett., 8 (10), 276-278, Oct. 2001 https://doi.org/10.1109/97.957270
J.-H. Chang, J. W. Shin, and N. S. Kim, 'Voice activity detector employing generalized gaussian distribution,' Electron. Lett., 40 (24), 1561-1563, Nov. 2004 https://doi.org/10.1049/el:20047090
J.-H. Chang, N. S. Kim, and S. K. Mitra, 'Voice activity detection based on multiple statistical models,' IEEE Trans. Sig. Process., 54 (6), 1965-1976, June 2006 https://doi.org/10.1109/TSP.2006.874403
Y. C. Lee and S. S. Ahn, 'Statistical model-based VAD algorithm with Wavelet Transform,' IEICE Trans. Fundamentals., E89-A (6), 1594-1600, June 2006 https://doi.org/10.1093/ietfec/e89-a.6.1594
J. Ramirez, J. M. Gorriz, J. C. Segura, C. G. Puntonet, and A. J. Rubio, 'Speech/non-speech discrimination based on contextual information integrated bispectrum LRT,' IEEE Sig. Process. Lett., 13 (8), 497-500, Aug. 2006 https://doi.org/10.1109/LSP.2006.873147
D. Enqing, L. Guizhong, Z. Yatong, and Z. Xiaodi, 'Applying support vector machines to voice activity detection,' Proc. Int. Conf. Sig. Process., 2, 1124-1127, Aug. 2002
J. Ramirez, J. M. Gorriz, J. C. Segura, C. G. Puntonet, and A. J. Rubio, 'Speech/Non-speech discrimination based on contextual information integrated bispectrum LRT,' IEEE Sig. Process. Lett., 13 (8), 497-500, Aug. 2006 https://doi.org/10.1109/LSP.2006.873147
V. N Vapnik, 'An overview of statistical learning theory,' IEEE Trans. Neural Networks, 10 (5), 988-999, Sep. 1999 https://doi.org/10.1109/72.788640
N. Cristianini and J. Shawe-Taylor, An introduction to support vector machines and other kernel-based learning methods. (Cambridge Univ. Press, 2000)

The Journal of the Acoustical Society of Korea (한국음향학회지)

Voice Activity Detection Based on SVM Classifier Using Likelihood Ratio Feature Vector

우도비 특징 벡터를 이용한 SVM 기반의 음성 검출기

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)