Browse > Article

Voice Activity Detection Algorithm base on Radial Basis Function Networks with Dual Threshold  

Kim Hong lk (한양대학교 전자통신전파공학과대학원 응용통신연구실)
Park Sung Kwon (한양대학교 전자전기컴퓨터공학부)
Abstract
This paper proposes a Voice Activity Detection (VAD) algorithm based on Radial Basis Function (RBF) network using dual threshold. The k-means clustering and Least Mean Square (LMS) algorithm are used to upade the RBF network to the underlying speech condition. The inputs for RBF are the three parameters in a Code Exited Linear Prediction (CELP) coder, which works stably under various background noise levels. Dual hangover threshold applies in BRF-VAD for reducing error, because threshold value has trade off effect in VAD decision. The experimental result show that the proposed VAD algorithm achieves better performance than G.729 Annex B at any noise level.
Keywords
voice activity detection; Radial Basis Function network; dual threshold;
Citations & Related Records
연도 인용수 순위
  • Reference
1 William J.Phillips, Caner Tosuner, William Robertson, 'SPEECH RECOGNITION TECHNIQUE USING RBF NETWORKS,' WESCANEX 95. Communications, Power, and Computing, Conference Proceedings. IEEE., Vol.l, pp.185-190, May 1995   DOI
2 Chen S., Grant P.M., Cowan C.F.N, 'Orthogonal least squares algorithm for training multi-output radial basis function networks,' Artificial Neural Networks, Second International Conference, pp336-339, Nov. 1991
3 SIMON HAYKIN, 'Neural Networks A Comprehensive foundation,' Prentice Hall, pp298-305, 1999
4 Chedsada Chinrungrueng and Carlo H. Sequin, 'Optimal Adaptive K-menas Algorithm with Dynamic Adjustment of Learning rate,' Neural Networks, IEEE Transactions on, Volume:6, Issue:1, pp. 157-169, January 1995   DOI   ScienceOn
5 ITU-T Recommendation G.729, 'Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP),' Mar. 1996
6 Richard V. Cox,' Three New Speech Coders from the ITU Cover a Range of Applications,' IEEE Comm. Magazine, Sep. 1997
7 Jacek M. Zurada, 'Introduction to Artificial Neural Systems,' West publishing Company, 1992
8 Zhang, Y.-M. Li, X.R. 'Hybrid training of RBF networks with application to nonlinear systems identification,' Decision and Control, Proceedings of the 35th IEEE, vol.1, pp937-942, Dec. 1996   DOI
9 John Scourias, 'Overview of the Global System for Mobile Communications,' University of Waterloo, May 1995
10 ITU-T Recommendations P.56, 'Objective measurement of active speech level,' March 1993
11 A. M. Kondos, 'Digital Speech,' John Wiley & Sons, 1994
12 D.R. Hush, B.G. Horne, 'Progress in Supervised Neural Networks: What's New Since Lippmann?,' IEEE Signal Processing Magame, pp.8-39, January 1993
13 Dong Enqing, Liu Guizhong, Zhou Yatong, Cai Yu, 'Voice Activity Detection Based on Short-Time energy and Noise Spectrum Adaption,' Signal Processing, 2002 6th International Conference, vol.l, pp 464-467, Aug. 2002
14 ITU-T Recommendation G.723.1, 'Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s,' Mar. 1996
15 ITU-T Recommendation G.723.1 Annex A, 'Silence compression scheme,' Nov. 1996
16 Jotaro Ikedo, 'Voice Acdvity Detection Using Neural Network,' IEICE Trans. Commun., Vol.E81-B, No.l2, Dec. 1998
17 Jaw Won Kim, Min Sik Seo, 'A Voice Activity Detection Algorithm for Wireless Communication Systems with Dynamically Varying Background Noise,' IEICE Trans. Commun., Feb. 2000
18 ITU-T Recommendation G.729 Annex B, 'A silence compression scheme for G.729Optimized for terminals conforming to recommendation V.70,' Nov. 1996