[KSCI] Korea Science Citation Index Service

A Simple Speech/Non-speech Classifier Using Adaptive Boosting

Kwon, Oh-Wook (School of Electrical and Computer Engineering, Chungbuk National University)
Lee, Te-Won (Institute for Neural Computation, University of California)

Publication Information

The Journal of the Acoustical Society of Korea / v.22, no.3E, 2003 , pp. 124-132 More about this Journal

Abstract

We propose a new method for speech/non-speech classifiers based on concepts of the adaptive boosting (AdaBoost) algorithm in order to detect speech for robust speech recognition. The method uses a combination of simple base classifiers through the AdaBoost algorithm and a set of optimized speech features combined with spectral subtraction. The key benefits of this method are the simple implementation, low computational complexity and the avoidance of the over-fitting problem. We checked the validity of the method by comparing its performance with the speech/non-speech classifier used in a standard voice activity detector. For speech recognition purpose, additional performance improvements were achieved by the adoption of new features including speech band energies and MFCC-based spectral distortion. For the same false alarm rate, the method reduced 20-50% of miss errors.

Keywords

Speech/non-speech classification; Speech detection; Adaptive boosting; AdaBoost algorithm;

Citations & Related Records

Reference

1	L. Karray, and A. Martin, 'Towards improving speech detection robustness for speech recognition in adverse conditions'" Speech Communication. 40 (3), 261-276, May 2003 DOI ScienceOn
2	Y. Freund, and R. E. Schapire, 'A decision-theoretic generalization of on-line learning and an application to boosting,' Journal of Computer and System Sciences, 55 (1), 119-139, 1997 DOI ScienceOn
3	R. E. Schapire, and Y. Singer, "Improved boosting algorithms using confidence -rated predictions,' Machine Learning, 37 (3), 297-336, 1999 DOI ScienceOn
4	H. G. Hirsch. and D. Pearce, 'The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions,' ISCA ITRW ASR2000 Automatic Speech Recognition: Challenges for the Next Millennium, Paris, France, Sept. 18-20, 2000
5	H. K. Kim, and R. C. Cox, 'Evaluation of robust speech recognition algorithms for distributed speech recognition in a noisy automobile environment,' Proc. ICSLP 2002, 233-236, Sept. 2002
6	N. W. D. Evans, and J. S. Mason, 'Noise estimation without explicit speech, non-speech detection: A comparison of mean, modal and median based approaches,' Proc. EUROSPEECH 2001. 893-896, 2001
7	B. Kotnik, Z. Kacic, and B. Horvat, 'A computational efficient real time noise robust speech recognition based on improved spectral subtraction method.' Proc. EUROSPEECH 2001, 1123-1126, 2001
8	R. Martin, 'Spectral subtraction based on minimum statistics.' Signal Processing VII, Theories and Applications. Proc. EUSIPCO-94, 1182-1185. 1994
9	A. Acero, Acoustical and Environmental Robustness in Automatic Speech Recognition, Kluwer Academic Publishers, Boston, 1993
10	C.-P. Chen, K. Filali, amd J. F. Bilmes, 'Frontend postprocessing and backend model enhancement on the Aurora 2.0/3.0 databases,' Proc. ICSLP 2002, 241-244, Sept. 2002
11	W.-H. Shin, B.-S. Lee, Y.-K. Lee. and J.-S. Lee, 'Speech/Non-speech classification using multiple features for robust endpoint detection'" Proc. ICASSP 2000. Ill. 1399-IIl. 1402, 2000
12	A. Benyassine, E. Shlomot, and H.-Y. Suo 'ITU-T Recommendation G.729 Annex B: A silent compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications,' IEEE Communications Magazine, 64-73, Sept. 1997
13	M. Marzinzik, and B. Kollmeier, 'Speech pause detection for noise spectrum estimation by tracking power envelope dynamics,' IEEE Trans. Speech and Audio Processing, 10 (2). 109-110, Feb. 2002 DOI ScienceOn