Voice Activity Detection Algorithm base on Radial Basis Function Networks with Dual Threshold

Radial Basis Function Networks를 이용한 이중 임계값 방식의 음성구간 검출기

  • 김홍익 (한양대학교 전자통신전파공학과대학원 응용통신연구실) ;
  • 박승권 (한양대학교 전자전기컴퓨터공학부)
  • Published : 2004.12.01

Abstract

This paper proposes a Voice Activity Detection (VAD) algorithm based on Radial Basis Function (RBF) network using dual threshold. The k-means clustering and Least Mean Square (LMS) algorithm are used to upade the RBF network to the underlying speech condition. The inputs for RBF are the three parameters in a Code Exited Linear Prediction (CELP) coder, which works stably under various background noise levels. Dual hangover threshold applies in BRF-VAD for reducing error, because threshold value has trade off effect in VAD decision. The experimental result show that the proposed VAD algorithm achieves better performance than G.729 Annex B at any noise level.

본 논문에서는 간단한 구조, 적은 계산량과 안정된 빠른 수렴속도를 가진 RBF (Radial Basis Function) 신경회로망을 이용한 이중 임계값 방식의 음성구간 검출기 알고리즘을 제안하고 시뮬레이션을 통해 유용성을 확인하였다. 음성압축기에 사용되는 CELP (Code-Excited Linear Prediction) 파라미터들을 신경회로망 입력으로 하여 잡음에 강하게 반응하게 하였고, 음성구간 검출기의 성능향상을 위해 음성구간과 침묵구간에서 다른 임계값을 사용하는 이중 임계값 방식을 적용하였다. 실험 결과 이중 임계값을 이용한 RBF 신경망 음성구간 검출기는 G.729 Annex B 음성구간 검출기 보다 우수한 성능을 보였고, 기존의 MLP (Multi Layer Perceptron) 신경회로망을 이용한 음성구간 검출기와 비교하여 음성구간에서는 비슷한 성능을 보였으나 침묵구간에서 25% 정도의 성능향상을 보였다.

Keywords

References

  1. Jotaro Ikedo, 'Voice Acdvity Detection Using Neural Network,' IEICE Trans. Commun., Vol.E81-B, No.l2, Dec. 1998
  2. Jaw Won Kim, Min Sik Seo, 'A Voice Activity Detection Algorithm for Wireless Communication Systems with Dynamically Varying Background Noise,' IEICE Trans. Commun., Feb. 2000
  3. Dong Enqing, Liu Guizhong, Zhou Yatong, Cai Yu, 'Voice Activity Detection Based on Short-Time energy and Noise Spectrum Adaption,' Signal Processing, 2002 6th International Conference, vol.l, pp 464-467, Aug. 2002
  4. Jacek M. Zurada, 'Introduction to Artificial Neural Systems,' West publishing Company, 1992
  5. D.R. Hush, B.G. Horne, 'Progress in Supervised Neural Networks: What's New Since Lippmann?,' IEEE Signal Processing Magame, pp.8-39, January 1993
  6. William J.Phillips, Caner Tosuner, William Robertson, 'SPEECH RECOGNITION TECHNIQUE USING RBF NETWORKS,' WESCANEX 95. Communications, Power, and Computing, Conference Proceedings. IEEE., Vol.l, pp.185-190, May 1995 https://doi.org/10.1109/WESCAN.1995.493968
  7. Chen S., Grant P.M., Cowan C.F.N, 'Orthogonal least squares algorithm for training multi-output radial basis function networks,' Artificial Neural Networks, Second International Conference, pp336-339, Nov. 1991
  8. Zhang, Y.-M. Li, X.R. 'Hybrid training of RBF networks with application to nonlinear systems identification,' Decision and Control, Proceedings of the 35th IEEE, vol.1, pp937-942, Dec. 1996 https://doi.org/10.1109/CDC.1996.574582
  9. SIMON HAYKIN, 'Neural Networks A Comprehensive foundation,' Prentice Hall, pp298-305, 1999
  10. Chedsada Chinrungrueng and Carlo H. Sequin, 'Optimal Adaptive K-menas Algorithm with Dynamic Adjustment of Learning rate,' Neural Networks, IEEE Transactions on, Volume:6, Issue:1, pp. 157-169, January 1995 https://doi.org/10.1109/72.363440
  11. ITU-T Recommendation G.729, 'Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP),' Mar. 1996
  12. ITU-T Recommendation G.729 Annex B, 'A silence compression scheme for G.729Optimized for terminals conforming to recommendation V.70,' Nov. 1996
  13. ITU-T Recommendation G.723.1, 'Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s,' Mar. 1996
  14. ITU-T Recommendation G.723.1 Annex A, 'Silence compression scheme,' Nov. 1996
  15. John Scourias, 'Overview of the Global System for Mobile Communications,' University of Waterloo, May 1995
  16. A. M. Kondos, 'Digital Speech,' John Wiley & Sons, 1994
  17. ITU-T Recommendations P.56, 'Objective measurement of active speech level,' March 1993
  18. Richard V. Cox,' Three New Speech Coders from the ITU Cover a Range of Applications,' IEEE Comm. Magazine, Sep. 1997