Browse > Article
http://dx.doi.org/10.7776/ASK.2013.32.2.147

Robust Speech Endpoint Detection in Noisy Environments for HRI (Human-Robot Interface)  

Park, Jin-Soo (고려대학교 바이오마이크로시스템기술 협동과정)
Ko, Han-Seok (고려대학교 전기전자전파공학부)
Abstract
In this paper, a new speech endpoint detection method in noisy environments for moving robot platforms is proposed. In the conventional method, the endpoint of speech is obtained by applying an edge detection filter that finds abrupt changes in the feature domain. However, since the feature of the frame energy is unstable in such noisy environments, it is difficult to accurately find the endpoint of speech. Therefore, a novel feature extraction method based on the twice-iterated fast fourier transform (TIFFT) and statistical models of speech is proposed. The proposed feature extraction method was applied to an edge detection filter for effective detection of the endpoint of speech. Representative experiments claim that there was a substantial improvement over the conventional method.
Keywords
Speech segmentation; Log likelihood ratio; Frame energy; Fast fourier transform; Edge detection filter; Spectral pattern;
Citations & Related Records
연도 인용수 순위
  • Reference
1 T. Kristjansson, S. Deligne, and P. Olsen, "Voicing features for robust speech detection," in Proc. Interspeech, 369-372 (2005).
2 Q. Jo, J. Chang, J. Kim, and N. Kim, "Statistical modelbased voice activity detection using support vector machine," IET Signal Process. 3, 205-210 (2009).   DOI   ScienceOn
3 Q. Jo, Y. Park, K. Lee, and J. Jang, "A support vector machine-based voice activity detection using effective feature vectors" (in Korean) J. Telecommunications Review 18, 362-370 (2008).
4 N. C. Maddage, K. Wan, and C. Xu, Wang, "Singing voice detection using twice-iterated composite fourier transform," in Proc. IEEE ICME, 1347-1350 (2004).
5 S. Gazor and W. Zhang, "A soft voice activity detector based on a Laplacian-Gaussian model," IEEE Trans. Speech Audio Process. 11, 498-505 (2003).   DOI   ScienceOn
6 J. Sohn and W. Sung, "A Voice activity detector employing soft decision based noise spectrum adaptation," in Proc. IEEE ICASSP, 365-368 (1998).
7 J. Beh, R. H. Baran, and H. Ko, "Dual channel based speech enhancement using novelty filter for robust speech recognition in automobile environment," IEEE Trans. Consumer Electronics 52, 583-589 (2006).   DOI   ScienceOn
8 J. Beh and H. Ko, "Spectral subtraction using spectral harmonics for robust speech recognition in car environments," LNCS 2660, 1109-1116 (2003).
9 L. R. Labiner and M. R. Sambur, "An algorithm for determining the endpoints for isolated utterance," Bell Syst. Tech. J. 54, 297-315 (1975).   DOI
10 L. R. Labiner and B. H. Juang, Fundamentals of Speech Recognition, (Prentice Hall, NJ, 1993).
11 ITU-T, A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to ITU-T V.70, (ITU-T Rec. G. 729, Annex B, 1996).
12 J. G. Wilpon and L. R. Labiner, "Application of hidden Markov models to automatic speech endpoint detection," Comput. Speech Lang. 2, 321-341 (1987).   DOI
13 E. Nemer, R. Goubran, and S. Mahmoud, "Robust voice activity detection using higher-order statistics in the LPC residual domain," IEEE Trans. Speech Audio Process. 9, 217-231 (2001).   DOI   ScienceOn
14 K. Li, M. N. S. Swamy, and M. O. Ahmad, "An improved voice activity detection using higher order statistics," IEEE Trans. Speech Audio Process. 13, 965-974 (2005).   DOI   ScienceOn
15 B. F. Wu and K. C. Wang, "Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments," IEEE Trans. Speech Audio Process. 13, 762-775 (2005).   DOI   ScienceOn
16 Q. Li and A. Tsai, "A matched filter approach to endpoint detection for robust speaker verification," in Proc. IEEE Work. AIAT (1999).
17 Q. Li, J. Zheng, A. Tsai, and Q. Zhou, "Robust endpoint detection and energy normalization for real-time speech and speaker recognition," IEEE Trans. Speech Audio Process. 10, 146-157 (2002).   DOI   ScienceOn
18 H. Ghaemmaghami, R. Vogt, S. Sridharan, and M. Mason, "Speech endpoint detection using gradient based edge detection techniques," in Proc. ICSPCS, 1-8 (2008).
19 T. Fukuda, O. Ichikawa, and M. Nishimura, "Long-term spectro-temporal and static harmonic features for voice activity detection," IEEE J. STSP 4, 834-844 (2010).
20 K. Ishizuka, T. Nakatani, and M. Fujimoto, "Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio," Speech Communication 52, 41-60 (2010).   DOI   ScienceOn